polyval, ghash: use powers of H to process multiple blocks at a time

We can compute `h = [H^n, H^(n-1), ..., H]` and then process N blocks at a time. On a 2020 M1, a stride of 8 runs at about 0.17 cycles per byte whereas a stride of 1 runs at about 1.4 cycles per byte—an ~8x improvement.

Note that sometimes this isn't desirable. For example, HCTR-2 computes POLYVAL over single blocks and the overhead of constructing plus cleaning up an N-wide POLYVAL hurt performance. So, we probably need to offer both a "wide" and a "lite" implementation.

I'm happy to donate [my implementation][1] (x86 and aarch64).

[1]: https://github.com/ericlagergren/polyval-rs/blob/dc8fe263b4ec6a60d96c75f4c6fb9e1b6184366c/src/backend/aarch64.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polyval, ghash: use powers of H to process multiple blocks at a time #225

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development