-
Notifications
You must be signed in to change notification settings - Fork 315
Use +sme for Apple #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Use +sme for Apple #303
Conversation
@@ -87,7 +91,7 @@ cdecl(gf_vect_mad_sve): | |||
/* vector length agnostic */ | |||
.Lloopsve_vl: | |||
whilelo p0.b, x_pos, x_len | |||
b.none .return_pass | |||
b.eq .return_pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://llvm.org/doxygen/AArch64AsmParser_8cpp_source.html , b.none is the same as b.eq when +sve is specified.
@liuqinfei could you look into this issue? Thanks again! ;) |
In fact, i don't have an Apple computer that supports SVE on hand. So I can't verify this patch. Maybe you can supply your verifications on the machines with and without SVE. @cielavenir |
I don't have either I just checked compilation Thus we need to call for tester(s) with M4 Mac, otherwise we need to wait for the next github RUNNER (not image) update. |
We are looking into releasing 2.31.1 as soon as next week, with just bug fixes. If we have someone that can test this, then we can include it in the next release. |
Let's hold this PR for next release, once more testing is done |
I've got an M4 MacBook Air -- doesn't seem to work for me:
Oddly enough, running via
On master, |
@tipabu thank you for testing. maybe current code being accepted with +sme might be a assembler bug.... |
@tipabu, what about "master" branch? |
@pablodelara, on master (91da2ad |
Thanks @tipabu. So it looks like this PR is not needed... |
@tipabu actually according to https://qiita.com/zacky1972/items/b7b5dd456fe021b30eb2, I need to wrap the function with (compilation is tested in https://github.com/cielavenir/isa-l/actions/runs/14731321078) |
@cielavenir Tests now pass! And looking at someone's investigation, we shouldn't need to worry about losing sve checks for Macs; no Apple silicon supports it.
@pablodelara Only insofar as Macs were always getting the neon implementation. |
great thank you~ |
@cielavenir can you clean up the commits (so there is no "Merge branch 'master'"...)? Good opportunity to rebase against latest 'master' branch |
Signed-off-by: Taiju Yamada <[email protected]>
@pablodelara rebased. |
So one concern: This seems to be slightly slower than master. On this branch:
Whereas on master:
Multiple runs had similar results (±100MB/s on encode, ±200MB/s on decode, give or take). |
@tipabu on https://github.com/cielavenir/isa-l/tree/featSME_CI branch, I changed to call smstart/smstop only in the dispatched function. Setting smstart to the each subroutine called by ec_encode_data_sve could have overhead issue. if this is faster, I will rebase featSME branch again. |
@cielavenir I see roughly the same on cielavenir@aad8c5c:
|
Now I'm not sure if this is smstart overhead or sme impl is not faster than neon impl.. |
Today I went to biccamera ( 😂 ) and checked hw.optional. Then I found FEAT_SME but not FEAT_SVE. 1
This means that for apple the +sve code has to be compiled with +sme instead.
This is potentially quite breaking change, so I'd like this to be tested from those who have M4 Mac.
Call for tester(s): if you have M4 mac, please try running the test on your machine~~
Footnotes
Why Apple says something without FEAT_SVE armv9? ↩