[AArch64] Quantized MatMul performance improvement #2536

bgergely0 · 2024-10-03T08:13:00Z

This change implements an new matmul based on Armv8.6 i8mm instructions. As the change requires a nightly compiler, using the more performant version is optional, under the new arm-nightly-feat feature flag.

I have also added the Armv8.4 dotprod instructions under this flag.

Performance improvement:

+10% with only dotprod enabled (measured on LLaMa2)
+40-60% for i8mm based matmul (measured on quantized Whisper, different cores produced different results)

- implement Neon i8mm based matmul - using i8mm intrinsics requires a nightly compiler, so the change is added under the feature flag 'arm-nightly-feat' - also added vdotq under the same feature flag

[AArch64] Quantized MatMul performance improvement on Arm CPUs

3ac3acf

- implement Neon i8mm based matmul - using i8mm intrinsics requires a nightly compiler, so the change is added under the feature flag 'arm-nightly-feat' - also added vdotq under the same feature flag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Quantized MatMul performance improvement #2536

[AArch64] Quantized MatMul performance improvement #2536

bgergely0 commented Oct 3, 2024

[AArch64] Quantized MatMul performance improvement #2536

Are you sure you want to change the base?

[AArch64] Quantized MatMul performance improvement #2536

Conversation

bgergely0 commented Oct 3, 2024