Skip to content

Commit e98f47f

Browse files
ebiggersbriansmith
authored andcommitted
Fix missing vzeroupper in gcm_ghash_vpclmulqdq_avx2() for len=16
[Adapted by Brian Smith to *ring*'s prior modifications.] gcm_ghash_vpclmulqdq_avx2() executes vzeroupper only when len >= 32, as it was supposed to use only xmm registers for len == 16. But actually it was writing to two ymm registers unconditionally for $BSWAP_MASK and $GFPOLY. Therefore, there could be a slow-down in later code using legacy SSE instructions, in the (probably rare) case where gcm_ghash_vpclmulqdq_avx2() was called with len=16 *and* wasn't followed by a function that does vzeroupper, e.g. aes_gcm_enc_update_vaes_avx2(). (The Windows xmm register restore epilogue of gcm_ghash_vpclmulqdq_avx2() itself does use legacy SSE instructions, so probably was slowed down by this, but that's just 8 instructions.) Fix this by updating gcm_ghash_vpclmulqdq_avx2() to correctly use only xmm registers when len=16. This makes it match the similar code in aes-gcm-avx10-x86_64.pl, which does do it correctly, more closely. Also, make both functions just execute vzeroupper unconditionally, so that it won't be missed again. It's actually only 1 cycle on the CPUs this code runs on, and it no longer seems worth executing conditionally. Change-Id: I975cd5b526e5cdae1a567f4085c2484552bf6bea
1 parent 3542fbc commit e98f47f

File tree

2 files changed

+12
-5
lines changed

2 files changed

+12
-5
lines changed

crypto/fipsmodule/aes/asm/aes-gcm-avx10-x86_64.pl

+2-3
Original file line numberDiff line numberDiff line change
@@ -786,9 +786,6 @@ sub _ghash_update {
786786
jae .Laad_loop_1x$local_label_suffix
787787
788788
.Laad_large_done$local_label_suffix:
789-
# Issue the vzeroupper that is needed after using ymm or zmm registers.
790-
# Do it here instead of at the end, to minimize overhead for small AADLEN.
791-
vzeroupper
792789
793790
# GHASH the remaining data 16 bytes at a time, using xmm registers only.
794791
.Laad_blockbyblock$local_label_suffix:
@@ -809,6 +806,8 @@ sub _ghash_update {
809806
# Store the updated GHASH accumulator back to memory.
810807
vpshufb $BSWAP_MASK_XMM, $GHASH_ACC_XMM, $GHASH_ACC_XMM
811808
vmovdqu $GHASH_ACC_XMM, ($GHASH_ACC_PTR)
809+
810+
vzeroupper # This is needed after using ymm or zmm registers.
812811
___
813812
return $code;
814813
}

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl

+10-2
Original file line numberDiff line numberDiff line change
@@ -461,10 +461,16 @@ sub _ghash_4x {
461461
@{[ _save_xmmregs (6 .. 9) ]}
462462
.seh_endprologue
463463
464-
vbroadcasti128 .Lbswap_mask(%rip), $BSWAP_MASK
464+
# Load the bswap_mask and gfpoly constants. Since AADLEN is usually small,
465+
# usually only 128-bit vectors will be used. So as an optimization, don't
466+
# broadcast these constants to both 128-bit lanes quite yet.
467+
vmovdqu .Lbswap_mask(%rip), $BSWAP_MASK_XMM
468+
vmovdqu .Lgfpoly(%rip), $GFPOLY_XMM
469+
470+
# Load the GHASH accumulator.
465471
vmovdqu ($GHASH_ACC_PTR), $GHASH_ACC_XMM
466472
vpshufb $BSWAP_MASK_XMM, $GHASH_ACC_XMM, $GHASH_ACC_XMM
467-
vbroadcasti128 .Lgfpoly(%rip), $GFPOLY
473+
468474
469475
# Update GHASH with the remaining 16-byte block if any.
470476
.Lghash_lastblock:
@@ -479,6 +485,8 @@ sub _ghash_4x {
479485
# Store the updated GHASH accumulator back to memory.
480486
vpshufb $BSWAP_MASK_XMM, $GHASH_ACC_XMM, $GHASH_ACC_XMM
481487
vmovdqu $GHASH_ACC_XMM, ($GHASH_ACC_PTR)
488+
489+
vzeroupper
482490
___
483491
}
484492
$code .= _end_func;

0 commit comments

Comments
 (0)