Skip to content

v1.27

Latest

Choose a tag to compare

@github-actions github-actions released this 06 Feb 04:03
· 35 commits to master since this release

🆕 New Features

  • Mixing group size for BM1684X dynamic quantization.
  • Add topo-sort pass before layer group.
  • CUDA inference support.
  • Add Int lowering.
  • Mixing per token/channel dynamic quantization.
  • Dynamic support for multi-core.
  • Dump and load modules hash.
  • LoRA support for multicore dynamic.
  • Add CumSum Op interface for TPULang.
  • TPULang interface of RotPosEmb.
  • FP8 CUDA inference.
  • Model deploy adds the correctness argument.
  • LlmConverter support MLP for BM1690.
  • Architecture supporting memory tag will use IO tag.

🐛 Bug Fixes

  • Backend compile issue.
  • Wrong parameter for mix-precision (SOPHONSILK-576).
  • Global constant binary operation slice bug.
  • LayerGroup multi-branch bug.
  • MLIR-643 issue.
  • MLIR-715 issue.
  • Update backend.
  • time_fixed_subnetfor CV184X.
  • Qwen3VL ViT dynamic use multi-core.
  • DeconvOp dynamic parse parameter bug.
  • Workaround layer group error in LLM.
  • Make ResNet FP8 compile pass.
  • BModel checker command type compatibility problem.
  • DDR interleave profile error.
  • TPU profile parser error in 1684X.
  • Temporarily disable gather multi-core.
  • Gather index mapping requires check.
  • ONNX optimize CastOp eliminate bug.
  • Conv3D backend problem.
  • Revert architecture supporting memory tag will use IO tag.
  • ONNX optimization Cast bug.
  • Remove Cast bug.

🔨 Chores

  • Update libbmrt.
  • Adjust code for clarity.
  • Optimize PPL code writing.

📄 Documentation

  • Add visual tool guidance.