Skip to content

polyval, ghash: use powers of H to process multiple blocks at a time #225

Open
@ericlagergren

Description

We can compute h = [H^n, H^(n-1), ..., H] and then process N blocks at a time. On a 2020 M1, a stride of 8 runs at about 0.17 cycles per byte whereas a stride of 1 runs at about 1.4 cycles per byte—an ~8x improvement.

Note that sometimes this isn't desirable. For example, HCTR-2 computes POLYVAL over single blocks and the overhead of constructing plus cleaning up an N-wide POLYVAL hurt performance. So, we probably need to offer both a "wide" and a "lite" implementation.

I'm happy to donate my implementation (x86 and aarch64).

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions