Skip to content

Optimize crc64_rocksoft for aarch64 #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tipabu
Copy link

@tipabu tipabu commented Apr 25, 2025

Closes #326

Copy link
Contributor

@pablodelara pablodelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tipabu. What do you mean with this phrase? "I'm not how worried to be about them." It looks like this needs some rewording, thanks.

@tipabu tipabu force-pushed the rocksoft-aarch64 branch from e21f296 to b417007 Compare April 28, 2025 14:21
@tipabu
Copy link
Author

tipabu commented Apr 28, 2025

Fixed up -- sorry about that! Was just missing a "sure":

I'm not sure how worried to be about them.

@tipabu
Copy link
Author

tipabu commented Apr 28, 2025

FWIW, running locally (M4 MacBook Air) with master I see results like

crc64_rocksoft_norm_perf:
Start timed tests
crc64_rocksoft_norm_warm: runtime =    3062418 usecs, bandwidth 1516 MB in 3.0624 sec = 495.03 MB/s
finish 0xdd347d865410fc77
crc64_rocksoft_refl_perf:
Start timed tests
crc64_rocksoft_refl_warm: runtime =    3062684 usecs, bandwidth 1666 MB in 3.0627 sec = 544.28 MB/s
finish 0x8a0da75d679587c2

vs

crc64_rocksoft_norm_perf:
Start timed tests
crc64_rocksoft_norm_warm: runtime =    3062542 usecs, bandwidth 104695 MB in 3.0625 sec = 34185.81 MB/s
finish 0xdd347d865410fc77
crc64_rocksoft_refl_perf:
Start timed tests
crc64_rocksoft_refl_warm: runtime =    3017980 usecs, bandwidth 107434 MB in 3.0180 sec = 35598.00 MB/s
finish 0x8a0da75d679587c2

with this. Also did some extra equivalency testing against the base implementation with https://paste.opendev.org/show/bG3ZmKo1m1D2MZqb7B8Q/

@pablodelara
Copy link
Contributor

Thanks @tipabu. Could you also update Release notes with this change?

@tipabu
Copy link
Author

tipabu commented Apr 30, 2025

Should it go under 2. FIXED ISSUES or 3. CHANGE LOG & FEATURES ADDED? I'm thinking something like

* CRC64 Rocksoft implementation on aarch64 optimized similar to other CRC64 implementations.

@jeffareid
Copy link

Is someone going to add the rock soft polynomial to the X86 code base?

@tipabu
Copy link
Author

tipabu commented May 1, 2025

It's already there and optimized (see crc/crc64_rocksoft_*.asm) -- in fact, the x86_64 assembly is where I went for the constants for p4, p1, p0, and br.

@jeffareid
Copy link

It's already there

Somehow I missed that. Is there a program to generate the rk constants used in the .asm files. similar to what I did when I ported an older version to build with Visual Studio, such as crc64fg.cpp or crc64rg.cpp ?

@pablodelara
Copy link
Contributor

Should it go under 2. FIXED ISSUES or 3. CHANGE LOG & FEATURES ADDED? I'm thinking something like

* CRC64 Rocksoft implementation on aarch64 optimized similar to other CRC64 implementations.

Under 3. CHANGE LOG..., in the v2.32 section. Thanks!

@pablodelara
Copy link
Contributor

Should it go under 2. FIXED ISSUES or 3. CHANGE LOG & FEATURES ADDED? I'm thinking something like

* CRC64 Rocksoft implementation on aarch64 optimized similar to other CRC64 implementations.

Under 3. CHANGE LOG..., in the v2.32 section. Thanks!

Don't forget to rebase :)

tipabu added 2 commits May 1, 2025 10:44
This makes it easier to compare the constants used for crc/crc64_*_by8.asm,
crc/crc64_*_by16_10.asm, and crc/aarch64/crc64_*_pmull.h

Note that this revealed some unexpected (to me) discrepancies:

  ecma_refl: br_high != rk8 (92d8af2baf0e1e85 vs 92d8af2baf0e1e84)
  iso_refl: br_high != rk8 (b000000000000001 vs b000000000000000)
  jones_refl: br_high != rk8 (2b5926535897936b vs 2b5926535897936a)

I'm not sure how worried to be about them.

Signed-off-by: Tim Burke <[email protected]>
@tipabu tipabu force-pushed the rocksoft-aarch64 branch from b417007 to 6803d9b Compare May 1, 2025 17:47
@tipabu
Copy link
Author

tipabu commented May 1, 2025

@jeffareid Your same programs work; just need to #define rk08 0xad93d23594c93659ull // crc64 rocksoft -- though there's a difference of 1 in the rk8 / br_high values. This is similar to the discrepancy noted in 77d76b6... With some random testing, though, I couldn't find an input where that led to a difference in checksum.

@pablodelara Done.

@jeffareid
Copy link

jeffareid commented May 1, 2025

@tipabu

there's a difference of 1 in the rk8 / br_high values. This is similar to the discrepancy noted in 77d76b6... With some random testing, though, I couldn't find an input where that led to a difference in checksum.

That difference of 1 only occurs with refl CRC, and there is no advantage to not have the right most bit of rk8 set to 1. For reflected CRC, rk8 = the right most 64 bits of a 65 bit reversed polynomial. The missing 65th bit represents x^0, while the right most bit in rk8 represents x^64. Since the CRC is 64 bits, that right most x^64 bit can be ignored (set to 0), but there is no advantage to not having it set to 1.

If interested, I can explain this in more detail.

@tipabu
Copy link
Author

tipabu commented May 6, 2025

Thanks for the detailed explanation, @jeffareid! I think it finally clicked for me with

Since only the bits corresponding to x^0 to x^63 of the result are used for CRC, the calculation for bits corresponding to x^64 to x^127 is not fully implemented, which is why that right most bit (x^64) of rk8 can be 0 or 1, as it's not used.

@jeffareid
Copy link

jeffareid commented May 6, 2025

@tipabu

Thanks for the detailed explanation, @jeffareid! I think it finally clicked for me with

Since only the bits corresponding to x^0 to x^63 of the result are used for CRC, the calculation for bits corresponding to x^64 to x^127 is not fully implemented, which is why that right most bit (x^64) of rk8 can be 0 or 1, as it's not used.

If fully implemented, then the bits for x^64 to x^127 will all end up equal to zero, but that calculation is not needed since the returned CRC is only for x^0 to x^63. Still there's no advantage of leaving out the x^64 bit in rk8, so I don't know why that was done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

crc64_rocksoft missing optimizations for aarch64
3 participants