Skip to content

BLAKE3 SIMD performance on MSVC vs GCC #534

@Coralesoft

Description

@Coralesoft

Hi,
I've added BLAKE3 to https://github.com/cryptopp-modern/cryptopp-modern
with SSE4.1 (4-way) and AVX2 (8-way) parallel chunk hashing. All test vectors pass, thanks for the algorithm!

Benchmarks (Intel Core Ultra 7 155H, Windows 11, 64-bit Release, 16 KiB buffer):

Compiler Flags AVX2 speed
MinGW GCC 14.2 -O3 -msse4.1 -mavx2 2591 MiB/s
MSVC 2022 /O2 /arch:AVX2 1829 MiB/s

Same code, but MSVC is ~29% slower. I also tried /Ob3, but it made things worse (1766 MiB/s ~32% slower then gcc).

Have you seen similar compiler differences with your implementations?

Any known MSVC pitfalls or recommended flags/idioms for BLAKE3-style SIMD code?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions