-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[asm] Enable x86_64 asm for windows builds #4246
Conversation
minor question: This is notably useful for future reference, when people will look at this PR. |
I get consistently 2-3% better times with ms compiler when I enable asm. I also build asm with yasm (but building with ming64 also works). Even though unrelated to the change, curiously, I get roughly 10-15% better decoding times with any other compiler compared to ms compiler (with or without the asm). I tried clang-cl, I tried mingw64 (ver 14.2.0), I tried intel icx (2024, 2025) compiler. In my experience, ms compiler is usually pretty good with default optimization settings. I tried all kinds of stuff and it's always 10-15% slower at decoding. No idea why, perhaps hits some bad case somehwere. |
It is also our experience that Visual Studio compiler gets lower performance compared to |
I can confirm a fairly decent decompression speed gain on Windows,
|
In my project I use multiple perf critical libs, and none of them run slower with ms compiler than with gcc/clang. On the contrary, I've seen it perform better in some cases. 15-20% slower with zstd decoding stands out. |
I suspect it depends for what target a library is primarily developed for. |
Can you recommend what functions/blocks of code (or logical blocks) I could instrument to see where the delay comes from? These are the readings with intel vtune profiler (I test with hardware perf counters): |
One other interesting observation. If I run longer test (for all compression levels using a few different samples of data to compress) and then sum all encoding and decoding times, I get these results:
what a surprise. All compilers (except ms compiler) are within 1% when decoding, while ms compiler is 16.2% slower. When encoding, ms compiler produces best results. Almost 10% better than clang. |
I tried to take a look. It looks like some sort of optimizer issue that ends up using stack more often with ms compiler vs clang specifically inside This is asm from clang:
and this is asm from ms compiler:
|
Indeed, |
No description provided.