Skip to content

Commit bd2e6ae

Browse files
committed
Update CHANGELOG to contain latest fixes before release
Signed-off-by: John McLoughlin <john.mcloughlin@arm.com> Approved-by: Dan Johansson <dan.johansson@arm.com>
1 parent 5347fcd commit bd2e6ae

1 file changed

Lines changed: 12 additions & 16 deletions

File tree

CHANGELOG.md

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,32 +10,28 @@ KleidiAI follows the [Semantic Versioning](https://semver.org/) specification fo
1010

1111
## Upcoming Release
1212

13+
## v1.14.0
14+
1315
- New SME micro-kernels:
1416
- Indirect matrix multiplication (MxN) of QAI8 input and output.
1517
- Indirect matrix multiplication (MxN) of F16 input and output.
1618
- Indirect matrix multiplication (MxN) of F32 input and output.
1719
- Matrix multiplication (MxN) of QAI8 LHS and RHS with QAI8 output.
20+
- Depthwise Convolution RHS F32 Packing kernel.
21+
- New SME2 micro-kernels:
22+
- Depthwise Convolution (3x3) Planar kernel of F32 LHS and Packed F32 RHS with F32 output using MLA.
23+
- Convert SME2 matmul micro-kernels to pure assembly, and add MSVC support.
24+
- Affects: kai_matmul_clamp_f32_bf16p2vlx2_bf16p2vlx2_2vlx2vl_sme2_mopa
1825
- Optimizations:
19-
- Packing functions kai_rhs_pack_nxk_qai4c32ps1s0nrx4_qau4c32s1s0_f32_f32_f32_neon and kai_run_rhs_pack_nxk_qai4c32ps1s0nrx4_qau4c32s0s1_f32_f32_f32_neon have been further optimized.
20-
- Fixes
21-
- Fix out of bound read of intermediate values in kai_matmul_clamp_f16_qsi8d32p1vlx4_qai4c32p4vlx4_1vlx4vl_sme2_mopa micro-kernel
22-
23-
## v1.14.0
24-
26+
- Packing functions kai_rhs_pack_nxk_qai4c32ps1s0nrx4_qau4c32s1s0_f32_f32_f32_neon and kai_rhs_pack_nxk_qai4c32ps1s0nrx4_qau4c32s0s1_f32_f32_f32_neon have been further optimized.
27+
- Packing function kai_lhs_quant_pack_qai8dxp_f16_neon has been further optimized.
2528
- New Advanced SIMD micro-kernels:
2629
- Wider 6x32 block size variants of FP16 Matrix Multiplication, including a variant optimized for the Arm® Cortex®-A55 processor.
2730
- Wider 6x16 block size variants of FP32 Matrix Multiplication, including a variant optimized for the Arm® Cortex®-A55 processor.
28-
- Optimizations:
29-
- Packing function kai_lhs_quant_pack_qai8dxp_f16_neon has been further optimized.
30-
- New SME2 micro-kernels:
31-
- Depthwise Convolution (3x3) Planar kernel of F32 LHS and Packed F32 RHS with F32 output using MLA.
32-
- New SME micro-kernels:
33-
- Depthwise Convolution RHS F32 Packing kernel.
34-
- Convert SME and SME2 matmul micro-kernels to pure assembly, and add MSVC support. Affects:
35-
- kai_matmul_clamp_f32_bf16p2vlx2_bf16p2vlx2_2vlx2vl_sme2_mopa
3631
- Fixes:
37-
- Fix out-of-bounds write in `kai_matmul_clamp_f16_f16_f16p2vlx2b_1x8vl_sme_mla`
38-
- Fix out-of-bounds read in `kai_matmul_clamp_qai8_qai8_qsi8cxp2vlx4sb_1x16vl_sme2_dot`
32+
- Fix out-of-bound read of intermediate values in kai_matmul_clamp_f16_qsi8d32p1vlx4_qai4c32p4vlx4_1vlx4vl_sme2_mopa micro-kernel
33+
- Fix out-of-bounds write in kai_matmul_clamp_f16_f16_f16p2vlx2b_1x8vl_sme_mla
34+
- Fix out-of-bounds read in kai_matmul_clamp_qai8_qai8_qsi8cxp2vlx4sb_1x16vl_sme2_dot
3935

4036
## v1.13.0
4137

0 commit comments

Comments
 (0)