-
Notifications
You must be signed in to change notification settings - Fork 421
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Hi,
I've added BLAKE3 to https://github.com/cryptopp-modern/cryptopp-modern
with SSE4.1 (4-way) and AVX2 (8-way) parallel chunk hashing. All test vectors pass, thanks for the algorithm!
Benchmarks (Intel Core Ultra 7 155H, Windows 11, 64-bit Release, 16 KiB buffer):
| Compiler | Flags | AVX2 speed |
|---|---|---|
| MinGW GCC 14.2 | -O3 -msse4.1 -mavx2 | 2591 MiB/s |
| MSVC 2022 | /O2 /arch:AVX2 | 1829 MiB/s |
Same code, but MSVC is ~29% slower. I also tried /Ob3, but it made things worse (1766 MiB/s ~32% slower then gcc).
Have you seen similar compiler differences with your implementations?
Any known MSVC pitfalls or recommended flags/idioms for BLAKE3-style SIMD code?
Thanks.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested