You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix missing vzeroupper in gcm_ghash_vpclmulqdq_avx2() for len=16
[Adapted by Brian Smith to *ring*'s prior modifications.]
gcm_ghash_vpclmulqdq_avx2() executes vzeroupper only when len >= 32, as
it was supposed to use only xmm registers for len == 16. But actually
it was writing to two ymm registers unconditionally for $BSWAP_MASK and
$GFPOLY. Therefore, there could be a slow-down in later code using
legacy SSE instructions, in the (probably rare) case where
gcm_ghash_vpclmulqdq_avx2() was called with len=16 *and* wasn't followed
by a function that does vzeroupper, e.g. aes_gcm_enc_update_vaes_avx2().
(The Windows xmm register restore epilogue of
gcm_ghash_vpclmulqdq_avx2() itself does use legacy SSE instructions, so
probably was slowed down by this, but that's just 8 instructions.)
Fix this by updating gcm_ghash_vpclmulqdq_avx2() to correctly use only
xmm registers when len=16. This makes it match the similar code in
aes-gcm-avx10-x86_64.pl, which does do it correctly, more closely.
Also, make both functions just execute vzeroupper unconditionally, so
that it won't be missed again. It's actually only 1 cycle on the CPUs
this code runs on, and it no longer seems worth executing conditionally.
Change-Id: I975cd5b526e5cdae1a567f4085c2484552bf6bea
0 commit comments