v0.8.0 - 2025-12-20
This release introduces a breaking change such that the inference workload is unified to StatefulSet. The Deployment resources created by existing workspaces will be removed by the controller and new StatefulSet resources will be created instead. No manual operation is required for this migration, and it is expected that the inference server hits a short period of downtime due to the Pod recreation.
Changelog
Breaking Changes 💥
Features 🌈
- b966484 feat: update gpu-provisioner version to v0.3.8 for kaito (#1698)
- 91819b9 feat: preset-generator support generic model format and attn arch (#1690)
Bug Fixes 🐞
- 1366f9a fix: set imagePullPolicy to Always (#1702)
- 8945b5b fix: workload type in ragengine e2e test (#1697)
- dffd5f3 fix: invalid indentation in artifacthub links (#1683)
- e5d77e5 fix: cancel latest release when it's perrelease (#1680)
- e813c46 fix: release tag validation rule (#1677)
Code Refactoring 💎
Documentation 📘
- 318bf01 docs: fix namespace doc issue in keda-kaito-scaler (#1699)
- 87c9c32 docs: use kaito-workspace in keda install (#1694)
- eefd2b8 docs: add keda-autoscaler-inference scaling example in doc (#1682)
- bbe61d7 docs: refine naming in docs and examples (#1681)
- c78d68b docs: add keda-autoscaler-inference doc (#1679)
Maintenance 🔧
- 67deec5 chore: bump golang to 1.24.11 (#1695)
- 89aba34 chore: use pv cleaner from localcsi manager (#1687)
- 7911b00 chore: fix huggingface_hub version in preset_generator (#1693)
- 0fabc5c chore: bump ray to 0.25.1 (#1684)
- 3d33b89 chore: bump js-yaml from 3.14.1 to 3.14.2 in /website (#1647)
- 601ad7b chore: bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /website (#1657)
- e1efaa8 chore: e2e tests for pv support in RAG engine service (#1671)