Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hand-assemble instructions not supported in old binutils #2471

Merged
merged 5 commits into from
Mar 11, 2025

Conversation

arielb1
Copy link
Contributor

@arielb1 arielb1 commented Mar 10, 2025

This turned out to be quite ugly, but it fixes #2463, making ring work again on AL2.

The generated .o files are slightly different since there is no debuginfo, and I don't know of an easy way to add it back when using .byte instructions. Checked with objdump that there are no other differences in the .o files before and after this PR - might be worth having someone else check.

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 10, 2025

Maybe a global regex-replace would be nicer than using assemble - not sure.

@briansmith
Copy link
Owner

Maybe a global regex-replace would be nicer than using assemble - not sure.

Yes, I think that approach is what we'll need to do, because otherwise we're going to have lots of merge conflicts when merging changes from upstream BoringSSL.

Here is an example of how something similar is done in sha512-x86-64.pl:

sub sha256op38 {
    my $instr = shift;
    my %opcodelet = (
		"sha256rnds2" => 0xcb,
  		"sha256msg1"  => 0xcc,
		"sha256msg2"  => 0xcd	);

    if (defined($opcodelet{$instr}) && @_[0] =~ /%xmm([0-7]),\s*%xmm([0-7])/) {
      my @opcode=(0x0f,0x38);
	push @opcode,$opcodelet{$instr};
	push @opcode,0xc0|($1&7)|(($2&7)<<3);		# ModR/M
	return ".byte\t".join(',',@opcode);
    } else {
	return $instr."\t".@_[0];
    }
}

foreach (split("\n",$code)) {
	s/\`([^\`]*)\`/eval $1/geo;

	s/\b(sha256[^\s]*)\s+(.*)/sha256op38($1,$2)/geo;

	print $_,"\n";
}

Basically what happens is that the perlasm stuff happens to generate $code as a big string, and then this logic steps through it line-by-line and replaces each instruction.

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 10, 2025

Are you fine with a big list of instructions, or do you think I need to write code that does the encoding?

Copy link
Owner

@briansmith briansmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you fine with a big list of instructions, or do you think I need to write code that does the encoding?

I think the big list of instructions is fine because the worst that will happen is that an unsupported combination of instruction/operands will be output verbatim, which will just break the build with older tools again, and then presumably somebody will make a new PR to fix this.

@@ -958,6 +958,66 @@ sub _aes_gcm_update {
$code .= _aes_gcm_update 0;
$code .= _end_func;

print $code;
sub filter_and_print {
my %asmMap = (
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indention here looks a little off. I think it should look like this:

sub filter_and_print {
    my %asmMap = (
        'vaesenc %ymm2, %ymm12, %ymm12' => '.byte 0xc4,0x62,0x1d,0xdc,0xe2',
        ....
    )
    ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link

codecov bot commented Mar 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.62%. Comparing base (52b239c) to head (56f7307).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2471      +/-   ##
==========================================
+ Coverage   96.60%   96.62%   +0.01%     
==========================================
  Files         180      180              
  Lines       21780    21814      +34     
  Branches      539      539              
==========================================
+ Hits        21040    21077      +37     
  Misses        623      623              
+ Partials      117      114       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 10, 2025

Fixed indentation. I don't know why coverage doesn't like me

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 10, 2025

I think the big list of instructions is fine because the worst that will happen is that an unsupported combination of instruction/operands will be output verbatim, which will just break the build with older tools again, and then presumably somebody will make a new PR to fix this.

I agree, this looks less scary than having an encoder run wild

@briansmith
Copy link
Owner

I don't know why coverage doesn't like me

The coverage check likes to report that it is failing between the time the first coverage job finishes until all the coverage jobs have been run and codecov.io has updated the results.

@briansmith
Copy link
Owner

Checked with objdump that there are no other differences in the .o files before and after this PR - might be worth having someone else check.

The tests pass, so it's probably right. But, I will try with Intel XED (https://github.com/intelxed/xed) as I've used that in the past and found it helpful for this kind of thing.

Copy link
Owner

@briansmith briansmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thank you very much for doing this!

We will very soon land the avx512 version which will have the analogous issue. Are you planning to submit the analogous change for the avx512 version? Otherwise, this workaround would probably be short-lived.

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 11, 2025

We will very soon land the avx512 version which will have the analogous issue. Are you planning to submit the analogous change for the avx512 version? Otherwise, this workaround would probably be short-lived.

I will definitely do this workaround for avx512 as well.

@briansmith
Copy link
Owner

$ find target/x86_64-unknown-linux-gnu/debug -name "*aes-gcm-avx2*.o" -exec objdump -S {} \; | grep -E "(vaesenc|vpclmul).*ymm" | grep c4 | cut -f2,3 | sort | uniq

^ This isn't quite right but only because I suck at shell scripting and I ran out of time. Unfortunately it generates output something like:

c4 e3 65 44 d5 10       vpclmullqhqdq %ymm5,%ymm3,%ymm2

which we'd need to transform into:

'vpclmulqdq $0x10, %ymm5, %ymm3, %ymm2' => '.byte 0xc4,0xe3,0x65,0x44,0xd5,0x10',

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 11, 2025

Yeah. Do you think it's important to do that?

There are 4 variants and you can regex-replace them.

@briansmith
Copy link
Owner

Yeah. Do you think it's important to do that?

Not sure what you mean. I think it's good to have a script that at least mostly automates dealing with future merges, even if there's a manual copy-paste step. Presumably if we do such a script for this AVX2 implementation, it will help automate the creation of the PR for the avx512 version too, as we'll be able to tweak it very quickly to adapt to avx512.

Sorry my shell scripting is terrible; I'm sure you probably have it already solved.

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 11, 2025

I mean, do you want to block this PR on me writing that script?

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 11, 2025

added that script

@arielb1 arielb1 force-pushed the al2 branch 3 times, most recently from b5082e3 to db999ff Compare March 11, 2025 01:47
} else {
if($trimmed =~ /(vpclmulqdq|vaes).*%ymm/) {
die ("found instruction not supported under old binutils, please update asmMap with the results of running\n" .
'find target -name "*aes-gcm-avx2*.o" -exec python3 crypto/fipsmodule/aes/asm/make-avx-map-for-old-binutils.py \{\} \; | sort | uniq');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest find target -name "*aes-gcm-avx2*.o" -exec python3 crypto/fipsmodule/aes/asm/make-avx-map-for-old-binutils.py \{\} \; | LC_ALL=C sort | uniq

Without LC_ALL=C my system sorts the lines in a different order (aesenc after aesenclast). With L_ALL=C I get the same output as what's in the source.

@briansmith
Copy link
Owner

Thanks. This looks good to me, modulo the LC_ALL=C bit. I ran your script and pasted in the output (with LC_ALL=C) and verified that result was a no-op.

When we do the AVX-512 version, I think we should move filter_and_print to x86_64-xlate.pl so that it can be shared between all modules that will use these instructions without having to copy/paste it. You can do this now or we can do it during the avx512 work.

Could you please squash this?

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 11, 2025

When we do the AVX-512 version, I think we should move filter_and_print to x86_64-xlate.pl so that it can be shared between all modules that will use these instructions without having to copy/paste it

How would you do that?

@briansmith
Copy link
Owner

When we do the AVX-512 version, I think we should move filter_and_print to x86_64-xlate.pl so that it can be shared between all modules that will use these instructions without having to copy/paste it

How would you do that?

See this in the aes-gcm-avx2-x86_64.pl file:

$0 =~ m/(.*[\/\\])[^\/\\]+$/;
my $dir = $1;
my $xlate;
( $xlate = "${dir}x86_64-xlate.pl" and -f $xlate )
  or ( $xlate = "${dir}../../../perlasm/x86_64-xlate.pl" and -f $xlate )
  or die "can't locate x86_64-xlate.pl";

open OUT, "| \"$^X\" \"$xlate\" $flavour \"$output\"";
*STDOUT = *OUT;

My understanding is that whatever processing we put in x86_64-xlate.pl will get applied to all x86_64 files. You can see there is a function in x86_64-xlate.pl called process_line that seems to do the kind of line-by-line rewriting that we are doing here, for a variety of reasons.

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 11, 2025

Done. Put it in a separate file to make things cleaner.

I'll rather not touch x86_64-xlate.pl, it's too ugly.

my $xlate_binutils;
( $xlate_binutils = "${dir}xlate-old-binutils.pl" and -f $xlate_binutils )
or ( $xlate_binutils = "${dir}../../../perlasm/xlate-old-binutils.pl" and -f $xlate_binutils )
or die "can't locate xlate-old-binutils.pl";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your thinking here but this is likely to break some build systems I am aware of but am not allowed to explain to you because it adds a new file dependency. My suggestion is that we change back to the inline approach from the previous version and then develop a more general approach in the avx512 version where we'll have more time. I will do a release this morning if we can get this going. I will comment in the avx512 issue about the more general approach.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 11, 2025

Done. I'll rather brute force copy it over to AVX512 rather than play with x86_64-xlate.pl

@arielb1
Copy link
Contributor Author

arielb1 commented Mar 11, 2025

In any case it's done for AVX-2

@briansmith briansmith linked an issue Mar 11, 2025 that may be closed by this pull request
@briansmith briansmith merged commit 85d5c0a into briansmith:main Mar 11, 2025
161 checks passed
@briansmith briansmith added this to the 0.17.14 milestone Mar 11, 2025
@briansmith
Copy link
Owner

Thank you very much for contributing this, @arielb1!

@briansmith
Copy link
Owner

I verified that the output of objdump -d on the object files, before and after this change, for windows and linux (but not macOS or other OS) is identical other than the symbol prefixing of "0_17_13" -> "0_17_14".

@@ -958,6 +958,73 @@ sub _aes_gcm_update {
$code .= _aes_gcm_update 0;
$code .= _end_func;

print $code;
sub filter_and_print {
# This function replaces AVX2 assembly instructions with their assembled forms,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically these are VAES and VPCLMULQDQ instructions, not AVX2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants