Use +sme for Apple #303

cielavenir · 2024-11-09T01:02:49Z

Today I went to biccamera ( 😂 ) and checked hw.optional. Then I found FEAT_SME but not FEAT_SVE. ¹

This means that for apple the +sve code has to be compiled with +sme instead.

This is potentially quite breaking change, so I'd like this to be tested from those who have M4 Mac.

Call for tester(s): if you have M4 mac, please try running the test on your machine~~

Why Apple says something without FEAT_SVE armv9? ↩

cielavenir · 2024-11-09T01:04:21Z

erasure_code/aarch64/gf_vect_mad_sve.S

@@ -87,7 +91,7 @@ cdecl(gf_vect_mad_sve):
 	/* vector length agnostic */
 .Lloopsve_vl:
 	whilelo	p0.b, x_pos, x_len
-	b.none	.return_pass
+	b.eq	.return_pass


According to https://llvm.org/doxygen/AArch64AsmParser_8cpp_source.html , b.none is the same as b.eq when +sve is specified.

pablodelara · 2024-11-12T16:12:40Z

@liuqinfei could you look into this issue? Thanks again! ;)

liuqinfei · 2024-11-18T08:55:34Z

@liuqinfei could you look into this issue? Thanks again! ;)

In fact, i don't have an Apple computer that supports SVE on hand. So I can't verify this patch. Maybe you can supply your verifications on the machines with and without SVE. @cielavenir

cielavenir · 2024-11-18T09:38:00Z

I don't have either

I just checked compilation

Thus we need to call for tester(s) with M4 Mac, otherwise we need to wait for the next github RUNNER (not image) update.

pablodelara · 2024-12-09T17:24:35Z

We are looking into releasing 2.31.1 as soon as next week, with just bug fixes. If we have someone that can test this, then we can include it in the next release.

pablodelara · 2024-12-16T12:23:12Z

Let's hold this PR for next release, once more testing is done

tipabu · 2025-04-24T19:01:52Z

I've got an M4 MacBook Air -- doesn't seem to work for me:

tburke@2025-air isa-l % git clean -fdx && ./autogen.sh && ./configure --prefix ~/.local/ && make test
...
  CPPAS    mem/aarch64/mem_zero_detect_neon.lo
  CPPAS    mem/aarch64/mem_multibinary_arm.lo
  CC       mem/aarch64/mem_aarch64_dispatcher.lo
  CCLD     libisal.la
copying selected object files to avoid basename conflicts...
  CCLD     erasure_code/gf_vect_mul_base_test
erasure_code/gf_vect_mul_base_test
gf_vect_mul_base_test:
Random tests  done: Pass
Completed run: erasure_code/gf_vect_mul_base_test
  CC       erasure_code/gf_vect_dot_prod_base_test.o
  CCLD     erasure_code/gf_vect_dot_prod_base_test
erasure_code/gf_vect_dot_prod_base_test
gf_vect_dot_prod_base: 250x8192 done all: Pass
Completed run: erasure_code/gf_vect_dot_prod_base_test
  CC       erasure_code/gf_vect_dot_prod_test.o
  CCLD     erasure_code/gf_vect_dot_prod_test
erasure_code/gf_vect_dot_prod_test
make: *** [erasure_code/gf_vect_dot_prod_test.run] Illegal instruction: 4

Oddly enough, running via lldb (to try to get a better handle on where things went wrong) doesn't trip the same error:

tburke@2025-air isa-l % ./libtool --mode=execute lldb -o run erasure_code/gf_vect_dot_prod_test
(lldb) target create "/Users/tburke/Code/isa-l/erasure_code/.libs/gf_vect_dot_prod_test"
Current executable set to '/Users/tburke/Code/isa-l/erasure_code/.libs/gf_vect_dot_prod_test' (arm64).
(lldb) run
gf_vect_dot_prod: 16x8192 done all: Pass
Process 7194 launched: '/Users/tburke/Code/isa-l/erasure_code/.libs/gf_vect_dot_prod_test' (arm64)
Process 7194 exited with status = 0 (0x00000000)

On master, make test has everything pass.

cielavenir · 2025-04-27T13:01:25Z

@tipabu thank you for testing. maybe current code being accepted with +sme might be a assembler bug....

pablodelara · 2025-04-28T19:47:50Z

@tipabu, what about "master" branch?

tipabu · 2025-04-28T20:19:54Z

@pablodelara, on master (91da2ad add RISCV CI) all tests pass and perf suite runs fine.

pablodelara · 2025-04-29T10:29:28Z

Thanks @tipabu. So it looks like this PR is not needed...

cielavenir · 2025-04-29T12:48:24Z

@tipabu actually according to https://qiita.com/zacky1972/items/b7b5dd456fe021b30eb2, I need to wrap the function with smstart sm and smstop sm. I implemented that. If you have time could you try again?

(compilation is tested in https://github.com/cielavenir/isa-l/actions/runs/14731321078)

tipabu · 2025-04-29T16:01:07Z

@cielavenir Tests now pass! And looking at someone's investigation, we shouldn't need to worry about losing sve checks for Macs; no Apple silicon supports it.

So it looks like this PR is not needed...

@pablodelara Only insofar as Macs were always getting the neon implementation.

cielavenir · 2025-04-29T23:04:07Z

great thank you~

pablodelara · 2025-04-30T10:05:04Z

@cielavenir can you clean up the commits (so there is no "Merge branch 'master'"...)? Good opportunity to rebase against latest 'master' branch

Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>

cielavenir · 2025-04-30T12:11:02Z

@pablodelara rebased.

tipabu · 2025-04-30T16:37:22Z

So one concern: This seems to be slightly slower than master. On this branch:

tburke@2025-air isa-l % make erasure_code/erasure_code_perf.run
erasure_code/erasure_code_perf
Testing with 8 data buffers and 6 parity buffers (num errors = 4, in [ 4 0 5 1 ])
erasure_code_perf: 14x9344 4
erasure_code_encode_warm: runtime =    3062483 usecs, bandwidth 49464 MB in 3.0625 sec = 16151.90 MB/s
erasure_code_decode_warm: runtime =    3001748 usecs, bandwidth 59388 MB in 3.0017 sec = 19784.75 MB/s
done all: Pass
Completed run: erasure_code/erasure_code_perf

Whereas on master:

tburke@2025-air isa-l % make erasure_code/erasure_code_perf.run
erasure_code/erasure_code_perf
Testing with 8 data buffers and 6 parity buffers (num errors = 4, in [ 4 0 5 1 ])
erasure_code_perf: 14x9344 4
erasure_code_encode_warm: runtime =    3039658 usecs, bandwidth 49832 MB in 3.0397 sec = 16394.04 MB/s
erasure_code_decode_warm: runtime =    3027461 usecs, bandwidth 65886 MB in 3.0275 sec = 21763.11 MB/s
done all: Pass
Completed run: erasure_code/erasure_code_perf

Multiple runs had similar results (±100MB/s on encode, ±200MB/s on decode, give or take).

cielavenir · 2025-05-01T15:49:42Z

@tipabu on https://github.com/cielavenir/isa-l/tree/featSME_CI branch, I changed to call smstart/smstop only in the dispatched function. Setting smstart to the each subroutine called by ec_encode_data_sve could have overhead issue.

if this is faster, I will rebase featSME branch again.

tipabu · 2025-05-01T16:57:35Z

@cielavenir I see roughly the same on cielavenir@aad8c5c:

tburke@2025-air isa-l % make erasure_code/erasure_code_perf.run
erasure_code/erasure_code_perf
Testing with 8 data buffers and 6 parity buffers (num errors = 4, in [ 4 0 5 1 ])
erasure_code_perf: 14x9344 4
erasure_code_encode_warm: runtime =    3062620 usecs, bandwidth 49503 MB in 3.0626 sec = 16163.74 MB/s
erasure_code_decode_warm: runtime =    3006801 usecs, bandwidth 59369 MB in 3.0068 sec = 19745.01 MB/s
done all: Pass
Completed run: erasure_code/erasure_code_perf

cielavenir · 2025-05-02T05:39:29Z

Now I'm not sure if this is smstart overhead or sme impl is not faster than neon impl..

cielavenir commented Nov 9, 2024

View reviewed changes

cielavenir force-pushed the featSME branch from b4f4570 to b504f2e Compare November 9, 2024 10:22

pablodelara added bug bugfix and removed bug labels Nov 26, 2024

cielavenir marked this pull request as draft April 28, 2025 23:59

cielavenir marked this pull request as ready for review April 29, 2025 23:06

Use +sme for Apple

742787c

Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>

cielavenir force-pushed the featSME branch from ead56c9 to 742787c Compare April 30, 2025 12:10

Use +sme for Apple #303

Are you sure you want to change the base?

Use +sme for Apple #303

Conversation

cielavenir commented Nov 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

cielavenir Nov 9, 2024

Choose a reason for hiding this comment

Uh oh!

pablodelara commented Nov 12, 2024

Uh oh!

liuqinfei commented Nov 18, 2024

Uh oh!

cielavenir commented Nov 18, 2024

Uh oh!

pablodelara commented Dec 9, 2024

Uh oh!

pablodelara commented Dec 16, 2024

Uh oh!

tipabu commented Apr 24, 2025

Uh oh!

cielavenir commented Apr 27, 2025

Uh oh!

pablodelara commented Apr 28, 2025

Uh oh!

tipabu commented Apr 28, 2025

Uh oh!

pablodelara commented Apr 29, 2025

Uh oh!

cielavenir commented Apr 29, 2025

Uh oh!

tipabu commented Apr 29, 2025

Uh oh!

cielavenir commented Apr 29, 2025

Uh oh!

pablodelara commented Apr 30, 2025

Uh oh!

cielavenir commented Apr 30, 2025

Uh oh!

tipabu commented Apr 30, 2025

Uh oh!

cielavenir commented May 1, 2025

Uh oh!

tipabu commented May 1, 2025

Uh oh!

cielavenir commented May 2, 2025

Uh oh!

cielavenir commented Nov 9, 2024 •

edited

Loading