-
Notifications
You must be signed in to change notification settings - Fork 27
refactor: Some AArch64 specific NEON intrinsics was wrapped into macros. #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
kpchoi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking the following comment is required.
| { // Row 0 | ||
| int32x4_t sad_vector = vdupq_n_s32(0); | ||
| // Loop unrolling | ||
| #pragma GCC unroll 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this work on non-GCC compiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works also on CLang and does nothing on MSVC, because of unknown pragma is not an error.
P.S. It may produce warning on MSVC, in dependency on warning level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that the unrolling will not work properly on MSVC, and it can lead slow operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to tell without measurement. Measure before Optimizing.
- Small cycle will not flush instruction cache, so it may works faster without unrolling.
- MSVC may unroll it without pragmas. Trust to compiler.
- Do you really use MSVC to compile ARM project?
|
Now it can be compiled for ARMv7 target, I hope. |
|
The PR 1) proposes using several Macros, and 2) fixes minor typos, to refactor the ARM implementation. Please provide the following details: |
|
с) Currently muster brunch produce warnings during MSVC 2022 build. So, please clean it's first. And what's the "all platforms" you mention about? Please provide details. |
…code.