-
Notifications
You must be signed in to change notification settings - Fork 412
Adding support for SME1 GEMM FP32 kernel #7831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
could someone please help on the next steps for this PR? |
We have this SME2 kernel already: https://github.com/google/XNNPACK/blob/master/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c If the only difference is multi-vector load/store instructions, we'd rather avoid having two almost identical kernels coming from two very different sources with different support arrangements. Can you please look into figuring out a way to reconcile these two codepaths? Maybe send your kernel as a PR to KleidiAI, and then we can use it the way we pull in the above kernel? |
This is just a wrapper? // Wraps the
} I suspect xnn_pf32_gemm_minmax__asm_aarch64_neonsme requires KleidiAI so this will fail to build with kleidi disabled. |
28667b8
to
1ddfdb3
Compare
Hi @fbarchard , Yes, xnn_pf32_gemm_minmax_ukernel_32x32__neonsme is a wrapper that calls xnn_pf32_gemm_minmax__asm_aarch64_neonsme. Also, xnn_pf32_gemm_minmax__asm_aarch64_neonsme does not require kleidiAI as it is available in source form within src/pf32-gemm/gen/pf32-gemm-32x32-minmax-asm-aarch64-neonsme.S. However, we saw few build failures when KleidiAI was disabled. Fixes for these failures are added. |
910d272
to
28b5308
Compare
Adds support for SME1 for GEMM FP32 Kernel