You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Packing functions kai_rhs_pack_nxk_qai4c32ps1s0nrx4_qau4c32s1s0_f32_f32_f32_neon and kai_run_rhs_pack_nxk_qai4c32ps1s0nrx4_qau4c32s0s1_f32_f32_f32_neon have been further optimized.
20
-
- Fixes
21
-
- Fix out of bound read of intermediate values in kai_matmul_clamp_f16_qsi8d32p1vlx4_qai4c32p4vlx4_1vlx4vl_sme2_mopa micro-kernel
22
-
23
-
## v1.14.0
24
-
26
+
- Packing functions kai_rhs_pack_nxk_qai4c32ps1s0nrx4_qau4c32s1s0_f32_f32_f32_neon and kai_rhs_pack_nxk_qai4c32ps1s0nrx4_qau4c32s0s1_f32_f32_f32_neon have been further optimized.
27
+
- Packing function kai_lhs_quant_pack_qai8dxp_f16_neon has been further optimized.
25
28
- New Advanced SIMD micro-kernels:
26
29
- Wider 6x32 block size variants of FP16 Matrix Multiplication, including a variant optimized for the Arm® Cortex®-A55 processor.
27
30
- Wider 6x16 block size variants of FP32 Matrix Multiplication, including a variant optimized for the Arm® Cortex®-A55 processor.
28
-
- Optimizations:
29
-
- Packing function kai_lhs_quant_pack_qai8dxp_f16_neon has been further optimized.
30
-
- New SME2 micro-kernels:
31
-
- Depthwise Convolution (3x3) Planar kernel of F32 LHS and Packed F32 RHS with F32 output using MLA.
32
-
- New SME micro-kernels:
33
-
- Depthwise Convolution RHS F32 Packing kernel.
34
-
- Convert SME and SME2 matmul micro-kernels to pure assembly, and add MSVC support. Affects:
0 commit comments