-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hand-assemble instructions not supported in old binutils #2471
Conversation
Maybe a global regex-replace would be nicer than using assemble - not sure. |
Yes, I think that approach is what we'll need to do, because otherwise we're going to have lots of merge conflicts when merging changes from upstream BoringSSL. Here is an example of how something similar is done in sha512-x86-64.pl:
Basically what happens is that the perlasm stuff happens to generate |
Are you fine with a big list of instructions, or do you think I need to write code that does the encoding? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you fine with a big list of instructions, or do you think I need to write code that does the encoding?
I think the big list of instructions is fine because the worst that will happen is that an unsupported combination of instruction/operands will be output verbatim, which will just break the build with older tools again, and then presumably somebody will make a new PR to fix this.
@@ -958,6 +958,66 @@ sub _aes_gcm_update { | |||
$code .= _aes_gcm_update 0; | |||
$code .= _end_func; | |||
|
|||
print $code; | |||
sub filter_and_print { | |||
my %asmMap = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indention here looks a little off. I think it should look like this:
sub filter_and_print {
my %asmMap = (
'vaesenc %ymm2, %ymm12, %ymm12' => '.byte 0xc4,0x62,0x1d,0xdc,0xe2',
....
)
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2471 +/- ##
==========================================
+ Coverage 96.60% 96.62% +0.01%
==========================================
Files 180 180
Lines 21780 21814 +34
Branches 539 539
==========================================
+ Hits 21040 21077 +37
Misses 623 623
+ Partials 117 114 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fixed indentation. I don't know why coverage doesn't like me |
I agree, this looks less scary than having an encoder run wild |
The coverage check likes to report that it is failing between the time the first coverage job finishes until all the coverage jobs have been run and codecov.io has updated the results. |
The tests pass, so it's probably right. But, I will try with Intel XED (https://github.com/intelxed/xed) as I've used that in the past and found it helpful for this kind of thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, thank you very much for doing this!
We will very soon land the avx512 version which will have the analogous issue. Are you planning to submit the analogous change for the avx512 version? Otherwise, this workaround would probably be short-lived.
I will definitely do this workaround for avx512 as well. |
^ This isn't quite right but only because I suck at shell scripting and I ran out of time. Unfortunately it generates output something like:
which we'd need to transform into:
|
Yeah. Do you think it's important to do that? There are 4 variants and you can regex-replace them. |
Not sure what you mean. I think it's good to have a script that at least mostly automates dealing with future merges, even if there's a manual copy-paste step. Presumably if we do such a script for this AVX2 implementation, it will help automate the creation of the PR for the avx512 version too, as we'll be able to tweak it very quickly to adapt to avx512. Sorry my shell scripting is terrible; I'm sure you probably have it already solved. |
I mean, do you want to block this PR on me writing that script? |
added that script |
b5082e3
to
db999ff
Compare
} else { | ||
if($trimmed =~ /(vpclmulqdq|vaes).*%ymm/) { | ||
die ("found instruction not supported under old binutils, please update asmMap with the results of running\n" . | ||
'find target -name "*aes-gcm-avx2*.o" -exec python3 crypto/fipsmodule/aes/asm/make-avx-map-for-old-binutils.py \{\} \; | sort | uniq'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest find target -name "*aes-gcm-avx2*.o" -exec python3 crypto/fipsmodule/aes/asm/make-avx-map-for-old-binutils.py \{\} \; | LC_ALL=C sort | uniq
Without LC_ALL=C
my system sorts the lines in a different order (aesenc after aesenclast). With L_ALL=C
I get the same output as what's in the source.
Thanks. This looks good to me, modulo the When we do the AVX-512 version, I think we should move Could you please squash this? |
How would you do that? |
See this in the aes-gcm-avx2-x86_64.pl file:
My understanding is that whatever processing we put in |
Done. Put it in a separate file to make things cleaner. I'll rather not touch x86_64-xlate.pl, it's too ugly. |
my $xlate_binutils; | ||
( $xlate_binutils = "${dir}xlate-old-binutils.pl" and -f $xlate_binutils ) | ||
or ( $xlate_binutils = "${dir}../../../perlasm/xlate-old-binutils.pl" and -f $xlate_binutils ) | ||
or die "can't locate xlate-old-binutils.pl"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your thinking here but this is likely to break some build systems I am aware of but am not allowed to explain to you because it adds a new file dependency. My suggestion is that we change back to the inline approach from the previous version and then develop a more general approach in the avx512 version where we'll have more time. I will do a release this morning if we can get this going. I will comment in the avx512 issue about the more general approach.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k
Done. I'll rather brute force copy it over to AVX512 rather than play with x86_64-xlate.pl |
In any case it's done for AVX-2 |
Thank you very much for contributing this, @arielb1! |
I verified that the output of |
@@ -958,6 +958,73 @@ sub _aes_gcm_update { | |||
$code .= _aes_gcm_update 0; | |||
$code .= _end_func; | |||
|
|||
print $code; | |||
sub filter_and_print { | |||
# This function replaces AVX2 assembly instructions with their assembled forms, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically these are VAES and VPCLMULQDQ instructions, not AVX2.
This turned out to be quite ugly, but it fixes #2463, making ring work again on AL2.
The generated .o files are slightly different since there is no debuginfo, and I don't know of an easy way to add it back when using .byte instructions. Checked with objdump that there are no other differences in the .o files before and after this PR - might be worth having someone else check.