Skip to content

Commit e20f0b1

Browse files
authored
[ReleaseNote] Add release note for v0.17.0rc1 (#7240)
### What this PR does / why we need it? This pull request adds the release notes for `v0.17.0rc1`. It also updates version numbers across various documentation files, including `README.md`, `README.zh.md`, `docs/source/community/versioning_policy.md`, and `docs/source/conf.py` to reflect the new release. - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@4034c3d
1 parent 7e85f2f commit e20f0b1

8 files changed

Lines changed: 69 additions & 14 deletions

File tree

.github/workflows/schedule_image_build_and_push.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ on:
3030
type: choice
3131
options:
3232
- main
33+
- v0.17.0rc1
3334
- v0.16.0rc1
3435
- v0.15.0rc1
3536
- v0.14.0rc1

.github/workflows/schedule_update_estimated_time.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
name: e2e-test
2929
strategy:
3030
matrix:
31-
vllm_version: [v0.16.0]
31+
vllm_version: [v0.17.0]
3232
type: [full, light]
3333
uses: ./.github/workflows/_e2e_test.yaml
3434
with:

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Please use the following recommended versions to get started quickly:
6363

6464
| Version | Release type | Doc |
6565
|------------|--------------|--------------------------------------|
66-
| v0.16.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details |
66+
| v0.17.0rc1 | Latest release candidate | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html) for more details |
6767
| v0.13.0 | Latest stable version | See [QuickStart](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [Installation](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html) for more details |
6868

6969
## Contributing
@@ -86,7 +86,7 @@ Below are the maintained branches:
8686

8787
| Branch | Status | Note |
8888
|------------|--------------|--------------------------------------|
89-
| main | Maintained | CI commitment for vLLM main branch and vLLM v0.16.0 tag |
89+
| main | Maintained | CI commitment for vLLM main branch and vLLM v0.17.0 tag |
9090
| v0.7.1-dev | Unmaintained | Only doc fixes are allowed |
9191
| v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version, only bug fixes are allowed, and no new release tags anymore. |
9292
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.1 version |

README.zh.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP
5757

5858
| Version | Release type | Doc |
5959
|------------|--------------|--------------------------------------|
60-
|v0.16.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多|
60+
|v0.17.0rc1| 最新RC版本 |请查看[快速开始](https://docs.vllm.ai/projects/ascend/en/latest/quick_start.html)[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)了解更多|
6161
|v0.13.0| 最新正式/稳定版本 |[快速开始](https://docs.vllm.ai/projects/ascend/en/v0.13.0/quick_start.html) and [安装指南](https://docs.vllm.ai/projects/ascend/en/v0.13.0/installation.html)了解更多|
6262

6363
## 贡献
@@ -80,7 +80,7 @@ vllm-ascend有主干分支和开发分支。
8080

8181
| 分支 | 状态 | 备注 |
8282
|------------|------------|---------------------|
83-
| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.16.0)CI看护 |
83+
| main | Maintained | 基于vLLM main分支和vLLM最新版本(v0.17.0)CI看护 |
8484
| v0.7.1-dev | Unmaintained | 只允许文档修复 |
8585
| v0.7.3-dev | Maintained | 基于vLLM v0.7.3版本CI看护, 只允许Bug修复,不会再发布新版本 |
8686
| v0.9.1-dev | Maintained | 基于vLLM v0.9.1版本CI看护 |

docs/source/community/versioning_policy.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ The table below is the release compatibility matrix for vLLM Ascend release.
2323

2424
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | Triton Ascend |
2525
|-------------|-------------------|-----------------|-------------|---------------------------------|---------------|
26-
| v0.16.0rc1 | v0.16.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
26+
| v0.17.0rc1 | v0.17.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 |
27+
| v0.16.0rc1 | v0.16.0 | >= 3.10, < 3.12 | 8.5.1 | 2.9.0 / 2.9.0 | 3.2.0 |
2728
| v0.15.0rc1 | v0.15.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
2829
| v0.14.0rc1 | v0.14.1 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 | 3.2.0 |
2930
| v0.13.0 | v0.13.0 | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.8.0.post2 | 3.2.0 |
@@ -58,14 +59,15 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL
5859

5960
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
6061
|-------------|--------------|------------------|-------------|--------------------|
61-
| main | 4034c3d32e30d01639459edd3ab486f56993876d, v0.16.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 |
62+
| main | 4034c3d32e30d01639459edd3ab486f56993876d, v0.17.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 |
6263

6364
## Release cadence
6465

6566
### Release window
6667

6768
| Date | Event |
6869
|------------|-------------------------------------------|
70+
| 2026.03.15 | Release candidates, v0.17.0rc1 |
6971
| 2026.03.10 | Release candidates, v0.16.0rc1 |
7072
| 2026.02.27 | Release candidates, v0.15.0rc1 |
7173
| 2026.02.06 | v0.13.0 Final release, v0.13.0 |

docs/source/conf.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,15 +65,15 @@
6565
# the branch of vllm, used in vllm clone
6666
# - main branch: 'main'
6767
# - vX.Y.Z branch: 'vX.Y.Z'
68-
"vllm_version": "v0.16.0",
68+
"vllm_version": "v0.17.0",
6969
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
7070
# - main branch: 'main'
7171
# - vX.Y.Z branch: latest vllm-ascend release tag
72-
"vllm_ascend_version": "v0.16.0rc1",
72+
"vllm_ascend_version": "v0.17.0rc1",
7373
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
7474
# This value should be updated when cut down release.
75-
"pip_vllm_ascend_version": "0.16.0rc1",
76-
"pip_vllm_version": "0.16.0",
75+
"pip_vllm_ascend_version": "0.17.0rc1",
76+
"pip_vllm_version": "0.17.0",
7777
# CANN image tag
7878
"cann_image_tag": "8.5.1-910b-ubuntu22.04-py3.11",
7979
# vllm version in ci

docs/source/faqs.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@
22

33
## Version Specific FAQs
44

5-
- [[v0.16.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6969)
6-
- [[v0.15.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6838)
5+
- [[v0.17.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/7173)
76
- [[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/6583)
87

98
## General FAQs

docs/source/user_guide/release_notes.md

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,58 @@
11
# Release Notes
22

3+
## v0.17.0rc1 - 2026.03.15
4+
5+
This is the first release candidate of v0.17.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.
6+
7+
### Highlights
8+
9+
- Ascend950 chip is now supported. [#7151](https://github.com/vllm-project/vllm-ascend/pull/7151)
10+
- ACLGraph (graph mode) is now supported for Model Runner V2. [#7110](https://github.com/vllm-project/vllm-ascend/pull/7110)
11+
- Unified parallelized speculative decoding is supported, enabling parallel draft inference schemes simultaneously. [#6766](https://github.com/vllm-project/vllm-ascend/pull/6766)
12+
13+
### Features
14+
15+
- Auto-detect quantization format from model files, and remote model IDs (e.g., `org/model-name`) are also supported. `--quantization ascend` is not required now. [#7111](https://github.com/vllm-project/vllm-ascend/pull/7111)
16+
- Qwen3.5 is supported from this version on.
17+
- FlashLB algorithm for EPLB: supports per-step heat collection and multi-stage load balancing for better expert parallelism efficiency. [#6477](https://github.com/vllm-project/vllm-ascend/pull/6477)
18+
- LoRA with tensor parallel and `--fully-sharded-loras` is now fixed and working. [#6650](https://github.com/vllm-project/vllm-ascend/pull/6650)
19+
- LMCacheAscendConnector is added as a new KV cache pooling solution for Ascend. [#6882](https://github.com/vllm-project/vllm-ascend/pull/6882)
20+
- W8A8C8 quantization is now supported for DeepSeek-V3.2 and GLM5 in PD-mix scenario. [#7029](https://github.com/vllm-project/vllm-ascend/pull/7029)
21+
- [Experimental] Minimax-m2.5 model is now supported on Ascend NPU. [#7105](https://github.com/vllm-project/vllm-ascend/pull/7105)
22+
- [Experimental] Mooncake Layerwise Connector now supports hybrid attention manager with multiple KV cache groups. [#7022](https://github.com/vllm-project/vllm-ascend/pull/7022)
23+
- [Experimental] Prefix cache is now supported in hybrid model. [#7103](https://github.com/vllm-project/vllm-ascend/pull/7103)
24+
25+
### Performance
26+
27+
- Pipeline Parallel now supports async scheduling, improving throughput for PP deployments. [#7136](https://github.com/vllm-project/vllm-ascend/pull/7136)
28+
- Improved TTFT when using Mooncake connector by reducing log overhead. [#6125](https://github.com/vllm-project/vllm-ascend/pull/6125)
29+
- KV Pool lookup is optimized for short sequences (token length < block_size). [#7146](https://github.com/vllm-project/vllm-ascend/pull/7146)
30+
- Fix penalty ops in Model Runner V2, achieving ~10% performance improvement. [#7013](https://github.com/vllm-project/vllm-ascend/pull/7013)
31+
32+
### Documentation
33+
34+
- Added EPD (Encode-Prefill-Decode) documentation and load-balance proxy example. [#6221](https://github.com/vllm-project/vllm-ascend/pull/6221)
35+
- Added Ascend PyTorch Profiler usage guide. [#7117](https://github.com/vllm-project/vllm-ascend/pull/7117)
36+
- Fixed DSV3.1 PD configuration documentation. [#7187](https://github.com/vllm-project/vllm-ascend/pull/7187)
37+
38+
### Others
39+
40+
- Fix drafter crash in full graph mode for speculative decoding. [#7158](https://github.com/vllm-project/vllm-ascend/pull/7158) [#7148](https://github.com/vllm-project/vllm-ascend/pull/7148)
41+
- Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights. [#7139](https://github.com/vllm-project/vllm-ascend/pull/7139)
42+
- Fix ngram graph replay accuracy error on 310P. [#7134](https://github.com/vllm-project/vllm-ascend/pull/7134)
43+
- Fix FIA pad logic in graph mode after upstream vLLM change. [#7144](https://github.com/vllm-project/vllm-ascend/pull/7144)
44+
- Fix a precision issue caused by wrong KV cache reshape on Qwen3.5. [#7209](https://github.com/vllm-project/vllm-ascend/pull/7209)
45+
- Fix extra processes spawned on rank0 device. [#7107](https://github.com/vllm-project/vllm-ascend/pull/7107)
46+
- Graph capture failures now properly raise exceptions for easier debugging. [#5644](https://github.com/vllm-project/vllm-ascend/pull/5644)
47+
- Fix Qwen3.5 model by replacing torch_npu.npu_recurrent_gated_delta_rule by fused_recurrent_gated_delta_rule. [#7109](https://github.com/vllm-project/vllm-ascend/pull/7109)
48+
- Fix the bug when running Qwen3-Reranker-0.6B with LoRA. [#7156](https://github.com/vllm-project/vllm-ascend/pull/7156)
49+
50+
### Known Issue
51+
52+
- GLM5 requires transformers==5.2.0, and this will resolved by [vllm-project/vllm#30566](https://github.com/vllm-project/vllm/pull/30566), will not included in v0.17.0.
53+
- There is a precision issue with Qwen3-Next due to the changed tp weight split method. Will fix it in next release.
54+
- The minimum number of tokens of prefix cache hit in hybrid model is 2k now
55+
356
## v0.16.0rc1 - 2026.03.09
457

558
This is the first release candidate of v0.16.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.
@@ -42,7 +95,7 @@ This is the first release candidate of v0.16.0 for vLLM Ascend. Please follow th
4295
### Deprecation & Breaking Changes
4396

4497
- `enable_flash_comm_v1` config option has been renamed back to `enable_sp`. [#6883](https://github.com/vllm-project/vllm-ascend/pull/6883)
45-
- The auto-detect quantization format from model files is reverted, in v0.16.0rc1, we still need to add `---quantization ascend` to serve a model quantinized by modelslim. It will be added back in the next version after the bug with the remote model id is fixed. [#6873](https://github.com/vllm-project/vllm-ascend/pull/6873)
98+
- The auto-detect quantization format from model files is reverted, in v0.16.0rc1, we still need to add `--quantization ascend` to serve a model quantinized by modelslim. It will be added back in the next version after the bug with the remote model id is fixed. [#6873](https://github.com/vllm-project/vllm-ascend/pull/6873)
4699

47100
### Documentation
48101

0 commit comments

Comments
 (0)