🆕 New Features
- Mixing group size for BM1684X dynamic quantization.
- Add topo-sort pass before layer group.
- CUDA inference support.
- Add Int lowering.
- Mixing per token/channel dynamic quantization.
- Dynamic support for multi-core.
- Dump and load modules hash.
- LoRA support for multicore dynamic.
- Add CumSum Op interface for TPULang.
- TPULang interface of RotPosEmb.
- FP8 CUDA inference.
- Model deploy adds the correctness argument.
- LlmConverter support MLP for BM1690.
- Architecture supporting memory tag will use IO tag.
🐛 Bug Fixes
- Backend compile issue.
- Wrong parameter for mix-precision (SOPHONSILK-576).
- Global constant binary operation slice bug.
- LayerGroup multi-branch bug.
- MLIR-643 issue.
- MLIR-715 issue.
- Update backend.
- time_fixed_subnetfor CV184X.
- Qwen3VL ViT dynamic use multi-core.
- DeconvOp dynamic parse parameter bug.
- Workaround layer group error in LLM.
- Make ResNet FP8 compile pass.
- BModel checker command type compatibility problem.
- DDR interleave profile error.
- TPU profile parser error in 1684X.
- Temporarily disable gather multi-core.
- Gather index mapping requires check.
- ONNX optimize CastOp eliminate bug.
- Conv3D backend problem.
- Revert architecture supporting memory tag will use IO tag.
- ONNX optimization Cast bug.
- Remove Cast bug.
🔨 Chores
- Update libbmrt.
- Adjust code for clarity.
- Optimize PPL code writing.
📄 Documentation
- Add visual tool guidance.