Skip to content

Adding support for SME1 GEMM FP32 kernel #7831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

vgundlur
Copy link

Adds support for SME1 for GEMM FP32 Kernel

Copy link

google-cla bot commented Feb 18, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@vgundlur
Copy link
Author

could someone please help on the next steps for this PR?

@dsharlet
Copy link
Collaborator

We have this SME2 kernel already: https://github.com/google/XNNPACK/blob/master/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c

If the only difference is multi-vector load/store instructions, we'd rather avoid having two almost identical kernels coming from two very different sources with different support arrangements.

Can you please look into figuring out a way to reconcile these two codepaths? Maybe send your kernel as a PR to KleidiAI, and then we can use it the way we pull in the above kernel?

@fbarchard
Copy link
Collaborator

This is just a wrapper?
xnn_pf32_gemm_minmax_ukernel_32x32__neonsme that calls xnn_pf32_gemm_minmax__asm_aarch64_neonsme?

// Wraps the xnn_pf32_gemm_minmax__asm_aarch64_neonsme
// GEMM microkernel with a name that is compatible with our tooling.
void xnn_pf32_gemm_minmax_ukernel_32x32__neonsme(
size_t m, size_t n, size_t k, const void* lhs_packed,
const void* rhs_packed, float* dst, size_t dst_stride_row,
size_t dst_stride_col,
union xnn_f32_minmax_params
minmax_params[XNN_RESTRICT XNN_MIN_ELEMENTS(1)]) {

xnn_pf32_gemm_minmax__asm_aarch64_neonsme(lhs_packed, rhs_packed, dst, (k/sizeof(float)), &minmax_params->scalar.max,
                &minmax_params->scalar.min, m, n, NULL, 0, dst_stride_row);    

}

I suspect xnn_pf32_gemm_minmax__asm_aarch64_neonsme requires KleidiAI so this will fail to build with kleidi disabled.

@vgundlur vgundlur force-pushed the sme1_gemm_support branch 2 times, most recently from 28667b8 to 1ddfdb3 Compare April 4, 2025 05:25
@vgundlur
Copy link
Author

vgundlur commented Apr 4, 2025

Hi @fbarchard ,

Yes, xnn_pf32_gemm_minmax_ukernel_32x32__neonsme is a wrapper that calls xnn_pf32_gemm_minmax__asm_aarch64_neonsme. Also, xnn_pf32_gemm_minmax__asm_aarch64_neonsme does not require kleidiAI as it is available in source form within src/pf32-gemm/gen/pf32-gemm-32x32-minmax-asm-aarch64-neonsme.S.

However, we saw few build failures when KleidiAI was disabled. Fixes for these failures are added.

@vgundlur vgundlur force-pushed the sme1_gemm_support branch from 910d272 to 28b5308 Compare April 4, 2025 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants