Skip to content

v1.12.0

Choose a tag to compare

@EmilOhlssonARM EmilOhlssonARM released this 30 Mar 05:40
· 216 commits to main since this release
  • New Advanced SIMD micro-kernels:
    • Matrix multiplication (MxN) Micro-kernels of QAI8DX LHS and QSI4CX RHS with BF16 output, optimized for FEAT_I8MM.
    • Matrix multiplication (1xN) Micro-kernels of QAI8DX LHS and QSI4CX RHS with BF16 output, optimized for FEAT_DotProd.
    • Matrix multiplication (MxN) Micro-kernels of QAI8DX LHS and QSI4C32 RHS with BF16 output, optimized for FEAT_I8MM.
    • Matrix multiplication (1xN) Micro-kernels of QAI8DX LHS and QSI4C32 RHS with BF16 output, optimized for FEAT_DotProd.
  • New SME micro-kernels:
    • Matrix multiplication (1xN) of F32 LHS and RHS with F32 output, using instructions compatible with FEAT_SME.
    • Matrix multiplication (1xN) of F16 LHS and RHS with F16 output, using instructions compatible with FEAT_SME.
  • Convert SME transposed RHS packing micro-kernels to pure assembly:
    • kai_rhs_pack_nxk_f32p2vlx1biasf32_f32_f32_sme
    • kai_rhs_pack_nxk_x16p2vlx2b_x16_x16_sme
  • Include more micro-kernels in MSVC build:
    • kai_matmul_clamp_f32_f32_f32p8x1biasf32_6x8x4_neon_mla
    • kai_lhs_quant_pack_qsi8d32p_f32_neon
    • kai_rhs_pack_kxn_qsi8cxp_qsi8cx_neon
    • kai_rhs_pack_nxk_qsi4c32ps1s0scalef16_qsu4c32s16s0_neon
    • kai_rhs_pack_nxk_qsi4cxps1s0_qsu4cxs1s0_neon
    • kai_rhs_pack_nxk_qsi8cxp_qsi8cx_neon
  • Fixes
    • Update kai_kernel_matmul_clamp_f32_qai8dxp1vlx4_qsi8cxp4vlx4_1vlx4vl_sme2_mopa to improve accuracy
    • Convert common SME/SME2 code into assembly file kai_common_sme_asm.S
  • Documentation
    • Added ONNX Runtime MLAS library integration example.