I was looking through XKCP (eXtended Keccak Code Package, https://github.com/XKCP/XKCP) at what optimized implementations they have available.
I noticed they did have intrinsics-based implementations available for AVX2, but they compute e.g. Keccak-p1600 with 2, 4, or 8-way parallelism:
https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600-times4/AVX2/KeccakP-1600-times4-AVX2.c
There also appears to be a non-parallel intrinsics implementation for AVX-512:
https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600/AVX512/C/KeccakP-1600-AVX512.c
However, the non-parallel implementation for AVX2 is ASM-only:
https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600/AVX2/KeccakP-1600-AVX2.s
See also the ARMv8 FEAT_SHA3 extensions: #93.
I'm not sure if this is because an intrinsics-based implementation doesn't make sense due to the need for a precisely designed register schedule, or because someone hasn't done the work yet to implement it.
I was looking through XKCP (eXtended Keccak Code Package, https://github.com/XKCP/XKCP) at what optimized implementations they have available.
I noticed they did have intrinsics-based implementations available for AVX2, but they compute e.g. Keccak-p1600 with 2, 4, or 8-way parallelism:
https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600-times4/AVX2/KeccakP-1600-times4-AVX2.c
There also appears to be a non-parallel intrinsics implementation for AVX-512:
https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600/AVX512/C/KeccakP-1600-AVX512.c
However, the non-parallel implementation for AVX2 is ASM-only:
https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600/AVX2/KeccakP-1600-AVX2.s
See also the ARMv8
FEAT_SHA3extensions: #93.I'm not sure if this is because an intrinsics-based implementation doesn't make sense due to the need for a precisely designed register schedule, or because someone hasn't done the work yet to implement it.