hand-assemble instructions not supported in old binutils #2471

arielb1 · 2025-03-10T20:08:10Z

This turned out to be quite ugly, but it fixes #2463, making ring work again on AL2.

The generated .o files are slightly different since there is no debuginfo, and I don't know of an easy way to add it back when using .byte instructions. Checked with objdump that there are no other differences in the .o files before and after this PR - might be worth having someone else check.

arielb1 · 2025-03-10T20:08:55Z

Maybe a global regex-replace would be nicer than using assemble - not sure.

briansmith · 2025-03-10T20:20:34Z

Maybe a global regex-replace would be nicer than using assemble - not sure.

Yes, I think that approach is what we'll need to do, because otherwise we're going to have lots of merge conflicts when merging changes from upstream BoringSSL.

Here is an example of how something similar is done in sha512-x86-64.pl:

sub sha256op38 {
    my $instr = shift;
    my %opcodelet = (
		"sha256rnds2" => 0xcb,
  		"sha256msg1"  => 0xcc,
		"sha256msg2"  => 0xcd	);

    if (defined($opcodelet{$instr}) && @_[0] =~ /%xmm([0-7]),\s*%xmm([0-7])/) {
      my @opcode=(0x0f,0x38);
	push @opcode,$opcodelet{$instr};
	push @opcode,0xc0|($1&7)|(($2&7)<<3);		# ModR/M
	return ".byte\t".join(',',@opcode);
    } else {
	return $instr."\t".@_[0];
    }
}

foreach (split("\n",$code)) {
	s/\`([^\`]*)\`/eval $1/geo;

	s/\b(sha256[^\s]*)\s+(.*)/sha256op38($1,$2)/geo;

	print $_,"\n";
}

Basically what happens is that the perlasm stuff happens to generate $code as a big string, and then this logic steps through it line-by-line and replaces each instruction.

arielb1 · 2025-03-10T20:34:51Z

Are you fine with a big list of instructions, or do you think I need to write code that does the encoding?

briansmith

Are you fine with a big list of instructions, or do you think I need to write code that does the encoding?

I think the big list of instructions is fine because the worst that will happen is that an unsupported combination of instruction/operands will be output verbatim, which will just break the build with older tools again, and then presumably somebody will make a new PR to fix this.

briansmith · 2025-03-10T21:02:36Z

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl

@@ -958,6 +958,66 @@ sub _aes_gcm_update {
 $code .= _aes_gcm_update 0;
 $code .= _end_func;

-print $code;
+sub filter_and_print {
+        my %asmMap = (


The indention here looks a little off. I think it should look like this:

sub filter_and_print { my %asmMap = ( 'vaesenc %ymm2, %ymm12, %ymm12' => '.byte 0xc4,0x62,0x1d,0xdc,0xe2', .... ) ....

codecov · 2025-03-10T21:11:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.62%. Comparing base (52b239c) to head (56f7307).
Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2471      +/-   ##
==========================================
+ Coverage   96.60%   96.62%   +0.01%     
==========================================
  Files         180      180              
  Lines       21780    21814      +34     
  Branches      539      539              
==========================================
+ Hits        21040    21077      +37     
  Misses        623      623              
+ Partials      117      114       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arielb1 · 2025-03-10T21:23:56Z

Fixed indentation. I don't know why coverage doesn't like me

arielb1 · 2025-03-10T21:25:30Z

I think the big list of instructions is fine because the worst that will happen is that an unsupported combination of instruction/operands will be output verbatim, which will just break the build with older tools again, and then presumably somebody will make a new PR to fix this.

I agree, this looks less scary than having an encoder run wild

briansmith · 2025-03-10T22:38:09Z

I don't know why coverage doesn't like me

The coverage check likes to report that it is failing between the time the first coverage job finishes until all the coverage jobs have been run and codecov.io has updated the results.

briansmith · 2025-03-11T00:09:16Z

Checked with objdump that there are no other differences in the .o files before and after this PR - might be worth having someone else check.

The tests pass, so it's probably right. But, I will try with Intel XED (https://github.com/intelxed/xed) as I've used that in the past and found it helpful for this kind of thing.

briansmith

First, thank you very much for doing this!

We will very soon land the avx512 version which will have the analogous issue. Are you planning to submit the analogous change for the avx512 version? Otherwise, this workaround would probably be short-lived.

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl

arielb1 · 2025-03-11T01:04:19Z

We will very soon land the avx512 version which will have the analogous issue. Are you planning to submit the analogous change for the avx512 version? Otherwise, this workaround would probably be short-lived.

I will definitely do this workaround for avx512 as well.

briansmith · 2025-03-11T01:09:14Z

$ find target/x86_64-unknown-linux-gnu/debug -name "*aes-gcm-avx2*.o" -exec objdump -S {} \; | grep -E "(vaesenc|vpclmul).*ymm" | grep c4 | cut -f2,3 | sort | uniq

^ This isn't quite right but only because I suck at shell scripting and I ran out of time. Unfortunately it generates output something like:

c4 e3 65 44 d5 10       vpclmullqhqdq %ymm5,%ymm3,%ymm2

which we'd need to transform into:

'vpclmulqdq $0x10, %ymm5, %ymm3, %ymm2' => '.byte 0xc4,0xe3,0x65,0x44,0xd5,0x10',

arielb1 · 2025-03-11T01:09:51Z

Yeah. Do you think it's important to do that?

There are 4 variants and you can regex-replace them.

briansmith · 2025-03-11T01:11:59Z

Yeah. Do you think it's important to do that?

Not sure what you mean. I think it's good to have a script that at least mostly automates dealing with future merges, even if there's a manual copy-paste step. Presumably if we do such a script for this AVX2 implementation, it will help automate the creation of the PR for the avx512 version too, as we'll be able to tweak it very quickly to adapt to avx512.

Sorry my shell scripting is terrible; I'm sure you probably have it already solved.

arielb1 · 2025-03-11T01:12:40Z

I mean, do you want to block this PR on me writing that script?

arielb1 · 2025-03-11T01:35:38Z

added that script

briansmith · 2025-03-11T16:59:27Z

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl

+        } else {
+            if($trimmed =~ /(vpclmulqdq|vaes).*%ymm/) {
+                die ("found instruction not supported under old binutils, please update asmMap with the results of running\n" .
+                     'find target -name "*aes-gcm-avx2*.o" -exec python3 crypto/fipsmodule/aes/asm/make-avx-map-for-old-binutils.py \{\} \; | sort | uniq');


I suggest find target -name "*aes-gcm-avx2*.o" -exec python3 crypto/fipsmodule/aes/asm/make-avx-map-for-old-binutils.py \{\} \; | LC_ALL=C sort | uniq

Without LC_ALL=C my system sorts the lines in a different order (aesenc after aesenclast). With L_ALL=C I get the same output as what's in the source.

briansmith · 2025-03-11T17:02:38Z

Thanks. This looks good to me, modulo the LC_ALL=C bit. I ran your script and pasted in the output (with LC_ALL=C) and verified that result was a no-op.

When we do the AVX-512 version, I think we should move filter_and_print to x86_64-xlate.pl so that it can be shared between all modules that will use these instructions without having to copy/paste it. You can do this now or we can do it during the avx512 work.

Could you please squash this?

arielb1 · 2025-03-11T17:14:55Z

When we do the AVX-512 version, I think we should move filter_and_print to x86_64-xlate.pl so that it can be shared between all modules that will use these instructions without having to copy/paste it

How would you do that?

briansmith · 2025-03-11T17:24:28Z

When we do the AVX-512 version, I think we should move filter_and_print to x86_64-xlate.pl so that it can be shared between all modules that will use these instructions without having to copy/paste it

How would you do that?

See this in the aes-gcm-avx2-x86_64.pl file:

$0 =~ m/(.*[\/\\])[^\/\\]+$/;
my $dir = $1;
my $xlate;
( $xlate = "${dir}x86_64-xlate.pl" and -f $xlate )
  or ( $xlate = "${dir}../../../perlasm/x86_64-xlate.pl" and -f $xlate )
  or die "can't locate x86_64-xlate.pl";

open OUT, "| \"$^X\" \"$xlate\" $flavour \"$output\"";
*STDOUT = *OUT;

My understanding is that whatever processing we put in x86_64-xlate.pl will get applied to all x86_64 files. You can see there is a function in x86_64-xlate.pl called process_line that seems to do the kind of line-by-line rewriting that we are doing here, for a variety of reasons.

arielb1 · 2025-03-11T17:39:34Z

Done. Put it in a separate file to make things cleaner.

I'll rather not touch x86_64-xlate.pl, it's too ugly.

briansmith · 2025-03-11T17:47:18Z

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl

+my $xlate_binutils;
+( $xlate_binutils = "${dir}xlate-old-binutils.pl" and -f $xlate_binutils )
+  or ( $xlate_binutils = "${dir}../../../perlasm/xlate-old-binutils.pl" and -f $xlate_binutils )
+  or die "can't locate xlate-old-binutils.pl";


I like your thinking here but this is likely to break some build systems I am aware of but am not allowed to explain to you because it adds a new file dependency. My suggestion is that we change back to the inline approach from the previous version and then develop a more general approach in the avx512 version where we'll have more time. I will do a release this morning if we can get this going. I will comment in the avx512 issue about the more general approach.

Thanks!

arielb1 · 2025-03-11T17:54:28Z

Done. I'll rather brute force copy it over to AVX512 rather than play with x86_64-xlate.pl

arielb1 · 2025-03-11T18:06:25Z

In any case it's done for AVX-2

briansmith · 2025-03-11T18:54:47Z

Thank you very much for contributing this, @arielb1!

briansmith · 2025-03-11T22:08:19Z

I verified that the output of objdump -d on the object files, before and after this change, for windows and linux (but not macOS or other OS) is identical other than the symbol prefixing of "0_17_13" -> "0_17_14".

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl

Ariel Ben-Yehuda added 2 commits March 10, 2025 19:12

wrap all special instructions

ce7af1d

add assemble table

cc56f6a

hand-assemble instructions not present in old binutils

650e842

briansmith reviewed Mar 10, 2025

View reviewed changes

fix indentation & comment

16d5ac4

briansmith mentioned this pull request Mar 10, 2025

ring-0.17.13 build failure on amazonlinux:2 amd64 #2462

Closed

briansmith requested changes Mar 11, 2025

View reviewed changes

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl Show resolved Hide resolved

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl Show resolved Hide resolved

arielb1 force-pushed the al2 branch 3 times, most recently from b5082e3 to db999ff Compare March 11, 2025 01:47

briansmith reviewed Mar 11, 2025

View reviewed changes

arielb1 force-pushed the al2 branch from db999ff to 6ee8585 Compare March 11, 2025 17:39

arielb1 force-pushed the al2 branch from 6ee8585 to 9d704e0 Compare March 11, 2025 17:41

briansmith reviewed Mar 11, 2025

View reviewed changes

briansmith mentioned this pull request Mar 11, 2025

Work around incompatibility of aes-gcm-avx512 with ancient GNU as (GNU binutils) #2469

Open

add script for regenerating asmMap

56f7307

arielb1 force-pushed the al2 branch from 9d704e0 to 56f7307 Compare March 11, 2025 17:54

briansmith linked an issue Mar 11, 2025 that may be closed by this pull request

ring-0.17.13 build failure on amazonlinux:2 amd64 #2462

Closed

briansmith approved these changes Mar 11, 2025

View reviewed changes

briansmith linked an issue Mar 11, 2025 that may be closed by this pull request

ring 0.17.13 does not "cross" compile (on illumos, other x86_64 targets) unlike ring 0.17.12 #2461

Closed

briansmith merged commit 85d5c0a into briansmith:main Mar 11, 2025
161 checks passed

briansmith added this to the 0.17.14 milestone Mar 11, 2025

ebiggers reviewed Mar 12, 2025

View reviewed changes

crypto/fipsmodule/aes/asm/aes-gcm-avx2-x86_64.pl Show resolved Hide resolved

hand-assemble instructions not supported in old binutils #2471

hand-assemble instructions not supported in old binutils #2471

Uh oh!

Conversation

arielb1 commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arielb1 commented Mar 10, 2025

Uh oh!

briansmith commented Mar 10, 2025

Uh oh!

arielb1 commented Mar 10, 2025

Uh oh!

briansmith left a comment

Choose a reason for hiding this comment

Uh oh!

briansmith Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

arielb1 Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

arielb1 commented Mar 10, 2025

Uh oh!

arielb1 commented Mar 10, 2025

Uh oh!

briansmith commented Mar 10, 2025

Uh oh!

briansmith commented Mar 11, 2025

Uh oh!

briansmith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arielb1 commented Mar 11, 2025

Uh oh!

briansmith commented Mar 11, 2025

Uh oh!

arielb1 commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

briansmith commented Mar 11, 2025

Uh oh!

arielb1 commented Mar 11, 2025

Uh oh!

arielb1 commented Mar 11, 2025

Uh oh!

briansmith Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

briansmith commented Mar 11, 2025

Uh oh!

arielb1 commented Mar 11, 2025

Uh oh!

briansmith commented Mar 11, 2025

Uh oh!

arielb1 commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

briansmith Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

arielb1 Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

arielb1 commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arielb1 commented Mar 11, 2025

Uh oh!

Uh oh!

briansmith commented Mar 11, 2025

Uh oh!

briansmith commented Mar 11, 2025

Uh oh!

arielb1 commented Mar 10, 2025 •

edited

Loading

codecov bot commented Mar 10, 2025 •

edited

Loading

arielb1 commented Mar 11, 2025 •

edited

Loading

arielb1 commented Mar 11, 2025 •

edited

Loading

arielb1 commented Mar 11, 2025 •

edited

Loading