v1.27

Latest

Latest

github-actions released this 06 Feb 04:03

· 35 commits to master since this release

6979325

🆕 New Features

Mixing group size for BM1684X dynamic quantization.
Add topo-sort pass before layer group.
CUDA inference support.
Add Int lowering.
Mixing per token/channel dynamic quantization.
Dynamic support for multi-core.
Dump and load modules hash.
LoRA support for multicore dynamic.
Add CumSum Op interface for TPULang.
TPULang interface of RotPosEmb.
FP8 CUDA inference.
Model deploy adds the correctness argument.
LlmConverter support MLP for BM1690.
Architecture supporting memory tag will use IO tag.

🐛 Bug Fixes

Backend compile issue.
Wrong parameter for mix-precision (SOPHONSILK-576).
Global constant binary operation slice bug.
LayerGroup multi-branch bug.
MLIR-643 issue.
MLIR-715 issue.
Update backend.
time_fixed_subnetfor CV184X.
Qwen3VL ViT dynamic use multi-core.
DeconvOp dynamic parse parameter bug.
Workaround layer group error in LLM.
Make ResNet FP8 compile pass.
BModel checker command type compatibility problem.
DDR interleave profile error.
TPU profile parser error in 1684X.
Temporarily disable gather multi-core.
Gather index mapping requires check.
ONNX optimize CastOp eliminate bug.
Conv3D backend problem.
Revert architecture supporting memory tag will use IO tag.
ONNX optimization Cast bug.
Remove Cast bug.

🔨 Chores

Update libbmrt.
Adjust code for clarity.
Optimize PPL code writing.

📄 Documentation

Add visual tool guidance.

Assets 8