-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary GCD #755
base: master
Are you sure you want to change the base?
Binary GCD #755
Conversation
Update:
Benchmarks below.
|
There is still significant room for improvement for the "large" variant. In each of its
Assuming the first case, we can thus improve this multiplication by not considering some of the top limbs of Aside from these multiplications, the "large" variant is linear in This would require the development of a "bounded" multiplication feature. Would this project be interested in that? Spoiler: The upcoming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall. I find the implementation constant time. Offered a few annotations and documentations to clarify the code after going over it, and suggested some changes to boost performance. I think it is worth reading and considering, but I approve this.
Great work thanks @erik-3milabs !
src/uint/bingcd/gcd.rs
Outdated
// Todo: tweak this threshold | ||
if LIMBS < 8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, but looks like you answered it, your problem is that the answer may depend on the machine. Suppose I gave you an answer, do you know what to do in the code? If I say "this is optimal for 64-bit architecture and that for 32-bit".. If so, let's check on another machine.
src/uint/bingcd/xgcd.rs
Outdated
b = Uint::select(&b, &b.wrapping_sub(&a), b_odd); | ||
matrix.conditional_subtract_top_row_from_bottom(b_odd); | ||
|
||
// Div b by two and double the top row of the matrix when a, b ≠ 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Algorithm 2, you multiply by 2 always, but I suppose you know what you are doing and he is wrong on this edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it multiplies the second row and not the first one (f1,g1) is multiplied by two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is a modification I made to the algorithm. It prevents integer overflows in an upcoming PR. I've updated the docstring of this function to indicate this.
- yes, but this code has a and b swapped, so I should be doubling the first row ;-)
Note: that doubling problem was fixed because I reverted the (a, b) swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out, this is wrong: this leads to buggy behavior for specific pairs of numbers. I'm refactoring some code to bypass this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #761
@@ -25,6 +25,12 @@ impl<const LIMBS: usize> Uint<LIMBS> { | |||
Uint { limbs } | |||
} | |||
|
|||
/// Swap `a` and `b` if `c` is truthy, otherwise, do nothing. | |||
#[inline] | |||
pub(crate) const fn conditional_swap(a: &mut Self, b: &mut Self, c: ConstChoice) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of my previous comment, it is worth to consider if we can efficiently select between two vectors or matrices, and similarly swap columns/rows efficiently (which you did implement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarcieri do you have any ideas on improving the performance of this and other swapping operations?
* This reverts commit 0897439 * This adds further annotation
Uint::conditional_swap(&mut a, &mut b, do_swap); | ||
|
||
// subtract b from a when a is odd | ||
a = a.wrapping_sub(&Uint::select(&Uint::ZERO, &b, a_odd)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarcieri what do you think of this line? Previously, it was like this:
a = Uint::select(&a, &a.wrapping_sub(&b), a_odd);
The current code is 25-10% faster for Uint
s with few limbs (1, 2, 3, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised there's that much of a difference. Are you sure it's always faster or is it faster depending on a
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty confident it is always faster.
The reason I think this is faster, is that we are now selecting between a constant and a variable, instead of two variables. Given that select
is const
and loves to be inlined, the compiler can now optimize the select
operation.
Recall, Uint::select
calls Limb::select
, which in turn calls
impl ConstChoice {
/// Return `b` if `self` is truthy, otherwise return `a`.
#[inline]
pub(crate) const fn select_word(&self, a: Word, b: Word) -> Word {
a ^ (self.0 & (a ^ b))
}
}
When a
is the constant ZERO, this can be optimized as:
self.0 & b
saving two XOR operations, or 2/3's of this operation.
Returning to the gcd
subroutine, this select
is in the hot loop of this algorithm. In total, the loop executes:
Uint::is_odd
(1 op)Uint::lt
(2 ops/word),ConstChoice::and
(1 op),Uint::wrapping_sub
(4 ops/word),Uint::select
(3 ops/word -> 1 ops/word)Uint::shr
(3 ops/word)
So, there is a reduction from 12 to 10 ops/word, or a 17% improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a is the constant ZERO
Aah, ok
@erik-3milabs I just set This means you can now make breaking changes, such as removing the existing safegcd implementation and changing trait impls like |
This PR introduces an implementation of Optimized Binary GCD. Ref: Pornin, Algorithm 2.
Upsides to this technique:
gcd
algorithm currently implemented incrypto_bigint
(see below) (really, it just has a different complexity bound).UNSAT_LIMBS
Benchmark results
Word = u64