-
Notifications
You must be signed in to change notification settings - Fork 429
Adding support for SME1 GEMM FP32 kernel #7831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
could someone please help on the next steps for this PR? |
We have this SME2 kernel already: https://github.com/google/XNNPACK/blob/master/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c If the only difference is multi-vector load/store instructions, we'd rather avoid having two almost identical kernels coming from two very different sources with different support arrangements. Can you please look into figuring out a way to reconcile these two codepaths? Maybe send your kernel as a PR to KleidiAI, and then we can use it the way we pull in the above kernel? |
This is just a wrapper? // Wraps the
} I suspect xnn_pf32_gemm_minmax__asm_aarch64_neonsme requires KleidiAI so this will fail to build with kleidi disabled. |
28667b8
to
1ddfdb3
Compare
Hi @fbarchard , Yes, xnn_pf32_gemm_minmax_ukernel_32x32__neonsme is a wrapper that calls xnn_pf32_gemm_minmax__asm_aarch64_neonsme. Also, xnn_pf32_gemm_minmax__asm_aarch64_neonsme does not require kleidiAI as it is available in source form within src/pf32-gemm/gen/pf32-gemm-32x32-minmax-asm-aarch64-neonsme.S. However, we saw few build failures when KleidiAI was disabled. Fixes for these failures are added. |
910d272
to
28b5308
Compare
@vgundlur can you please address the feedback I raised in this comment?
AFAICT these kernels are near identical, what makes them different? Why should we have both of these kernels? And if we really need both, can we do it in a way that doesn't copy paste a large block of assembly source code? (It might look like XNNPACK has a lot of copy/pasted kernels, but these are generated from the same source code wherever possible.) |
@dsharlet Thank you for commenting on the PR and sorry for the delay in responding your Query. We need both versions of these kernels for platforms supporting SME version 1 and Version 2. Kleidiai supports SME2 and we are pushing for SME1 support. We will try including the Original Kaleidiai implementation into our file and replace the unsupported instructions with a macro that modifies to supported instructions, Hope this will be good, if not Please suggest any other alternate approaches. |
28b5308
to
90de09c
Compare
@dsharlet, arm has pushed sme1 GEMM kernel to Kleidiai and we have pulled the GEMM kernel from it and updated our PR. Please help review the change and share your comments. |
@@ -18,8 +18,8 @@ ENDIF() | |||
# LINT.IfChange | |||
INCLUDE(ExternalProject) | |||
ExternalProject_Add(kleidiai | |||
URL https://github.com/ARM-software/kleidiai/archive/247088200c679f30b1b4a680bd12fee18457a100.zip | |||
URL_HASH SHA256=ad04cc186b12810ecde9d75911c76a0113d3c055773c700377de302eef6c4419 | |||
URL https://github.com/ARM-software/kleidiai/archive/c80d18838af1ecf68f011625db103de497b5a840.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update the hashes in the WORKSPACE
and MODULE
files for the bazel
builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/configs/gemm-config.c
Outdated
@@ -323,6 +323,22 @@ static void init_pf32_gemm_config(void) { | |||
pf32_gemm_config.nr = nr; | |||
#endif // XNN_ENABLE_ARM_SME2 | |||
} | |||
if(XNN_ENABLE_ARM_SME && (hardware_config->arch_flags & xnn_arch_arm_sme) && !(hardware_config->arch_flags & xnn_arch_arm_sme2)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you explicitly checking for !(hardware_config->arch_flags & xnn_arch_arm_sme2)
?
Would it not make more sense to make this an else if
statement instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/configs/pack-lh-config.c
Outdated
const struct xnn_hardware_config* hardware_config = xnn_init_hardware_config(); | ||
assert(hardware_config != NULL); | ||
if ((hardware_config->arch_flags & xnn_arch_arm_sme2)) { | ||
if ((hardware_config->arch_flags & xnn_arch_arm_sme2) || (hardware_config->arch_flags & xnn_arch_arm_sme)) { | ||
x32_pack_lh_config.pack_lh_fn = (xnn_pack_lh_ukernel_fn) xnn_x32_pack_lh_ukernel__neonsme2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these packing kernels only require sme
(and not sme2
), then they should be renamed so that they are automatically added to the list of neonsme_microkernels
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we need to test for sme
or sme2
? Doesn't the availability of sme2
imply the availability of sme
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The microkernel xnn_x32_pack_lh_ukernel__neonsme2 internally invokes kai_get_lhs_packed_size_lhs_pack_f32p2vlx1_f32_sme wich is based on SME1 and not SME2. So, as mentioned, the change has been brought in by renaming sme2 to sme.
Regarding the second point, availability of sme2 guarantees the availability of sme. But the reverse is not true. Hence, another sme2 check has been replaced with sme.
src/operators/pack-lh.c
Outdated
@@ -112,7 +112,7 @@ enum xnn_status reshape_pack_lh(xnn_operator_t pack_lh_op, size_t num_groups, | |||
return xnn_status_success; | |||
} | |||
|
|||
const uint32_t mr_packed = batch_size == 1 ? 1 | |||
const uint32_t mr_packed = batch_size == 1 ? (gemm_config->arch == xnn_arch_arm_sme ? gemm_config->mr : 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is because there is no neonsme
kernel for mr=1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run clang-format
on this (and all other) file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} else { | ||
kai_run_lhs_pack_f32p2vlx1_f32_sme(m, k, mr_packed, kr, sr, m_idx_start, | ||
lhs, lhs_stride, lhs_packed); | ||
const struct xnn_hardware_config* hardware_config = xnn_init_hardware_config(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of ugly. Would it be easier to add a 1x32
microkernel as well and not have to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require another pf32 gemm microkernel for mr = 1 (just like xnn_pf32_gemm_minmax_ukernel_1x32__neonsme2). But arm has not added support for an SME1 variant for the kernel kai_run_matmul_clamp_f32_f32_f32p2vlx1b_1x16vl_sme2_mla (which is invoked in xnn_pf32_gemm_minmax_ukernel_1x32__neonsme2) in kleidiai.
test/vbinary-microkernel-tester.cc
Outdated
@@ -63,7 +63,7 @@ void VBinaryMicrokernelTester::Test(xnn_f16_vbinary_ukernel_fn vbinary, | |||
|
|||
// Verify results. | |||
for (size_t i = 0; i < batch_size(); i++) { | |||
if (std::isnan(y_ref[i])) { | |||
if (/*std::isnan(y_ref[i])*/1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required, hence removed
test/vbinary-microkernel-tester.cc
Outdated
@@ -110,7 +110,7 @@ void VBinaryMicrokernelTester::Test(xnn_f32_vbinary_ukernel_fn vbinary, | |||
|
|||
// Verify results. | |||
for (size_t i = 0; i < batch_size(); i++) { | |||
if (std::isnan(y_ref[i])) { | |||
if (/*std::isnan(y_ref[i])*/1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required, hence removed
44e77a4
to
8f2a4c4
Compare
8f2a4c4
to
cc254df
Compare
Adds support for SME1 for GEMM FP32 Kernel