[Feature] Add DeepSeek V3.2 W8A8 INT8 model support by Lidang-Jiang · Pull Request #339 · baidu/vLLM-Kunlun

Lidang-Jiang · 2026-04-24T12:53:40Z

Summary

Support serving DeepSeek-V3.2-W8A8-INT8-Dynamic on the vLLM 0.19 Kunlun path.
Add compatibility/fallback paths for dense FlashMLA, MLA prefill/decode, and compressed-tensors W8A8 dynamic INT8 matmul/quantization.
Add regression coverage for the adapted MLA and quantization paths.

This PR is based on the v0.19 upgrade branch from #315; the new commit adds the DeepSeek V3.2 W8A8 INT8 Dynamic serving adaptation.

Full sanitized logs: https://gist.github.com/Lidang-Jiang/14df1d68a4f9cd17f18a63950cd007ad

Before

Baseline commit: 864d569

Service startup failed before the adaptation:

(APIServer pid=165706)   Value error, This model does not support `--runner generate`. [type=value_error, input_value=ArgsKwargs((), {'model': ...nderer_num_workers': 1}), input_type=ArgsKwargs]

Client verification:

mode=before
code=864d569
service_log=/workspace/deepseek_v32_before_864d569.log
curl /v1/models:
curl: (7) Failed to connect to 127.0.0.1 port 8566: Connection refused

curl /v1/chat/completions:
curl: (7) Failed to connect to 127.0.0.1 port 8566: Connection refused

After

Service startup:

(Worker_TP0 pid=166770) INFO 04-24 20:47:36 [default_loader.py:384] Loading weights took 106.27 seconds
(Worker_TP0 pid=166770) INFO 04-24 20:47:37 [gpu_model_runner.py:4820] Model loading took 81.07 GiB memory and 107.374742 seconds
(EngineCore pid=166479) INFO 04-24 20:48:46 [kv_cache_utils.py:1319] GPU KV cache size: 44,352 tokens
(EngineCore pid=166479) INFO 04-24 20:48:46 [kv_cache_utils.py:1324] Maximum concurrency for 32,768 tokens per request: 1.35x
(APIServer pid=166227) INFO:     Application startup complete.

Client verification:

mode=after
commit=3dffb27
service_log=/workspace/deepseek_v32_w8a8_p8566.log
curl /health:
200
curl /v1/models:
{"object":"list","data":[{"id":"DeepSeek-V3.2-W8A8-INT8-Dynamic","object":"model","owned_by":"vllm","root":"/models/DeepSeek-V3.2-W8A8-INT8-Dynamic","max_model_len":32768}]}

curl /v1/chat/completions:
HTTP 200, assistant content: 我是DeepSeek，由深度求索公司创造的AI助手！

Unit tests:

============================= test session starts ==============================
platform linux -- Python 3.10.15, pytest-8.3.3, pluggy-1.5.0
rootdir: /workspace/vLLM-Kunlun
configfile: pyproject.toml
plugins: typeguard-4.4.1, repeat-0.9.3, anyio-4.6.2.post1, timeout-2.3.1, metadata-3.1.1, html-3.2.0, cov-7.0.0
collected 59 items

tests/ut/test.py ....................................................... [ 93%]
....                                                                     [100%]

============================== 59 passed in 6.91s ==============================

Test plan

/workspace/python310_torch25_cuda/bin/python -m pytest tests/ut/test.py
Start service with DeepSeek-V3.2-W8A8-INT8-Dynamic
curl http://127.0.0.1:8566/health
curl http://127.0.0.1:8566/v1/models
curl http://127.0.0.1:8566/v1/chat/completions

- align package metadata, docs, and CI with vllm 0.19.0 - add 0.19.x compatibility shims and request-path fixes - add unit coverage for the new compatibility paths Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>

Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>

Enable the DeepSeek V3.2 W8A8 INT8 Dynamic model to start and serve chat on the vLLM 0.19 path. Add compressed-tensors W8A8 fallbacks, dense FlashMLA routing, MLA prefill/decode fallbacks, and regression tests for the adapted paths. Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>

Lidang-Jiang added 2 commits April 10, 2026 19:40

[Feature] Upgrade vLLM-Kunlun from 0.15.1 to 0.19.0

18c55d1

- align package metadata, docs, and CI with vllm 0.19.0 - add 0.19.x compatibility shims and request-path fixes - add unit coverage for the new compatibility paths Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>

fix: address v1 worker review feedback

864d569

Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>

Lidang-Jiang force-pushed the feat/deepseek-v32-w8a8-int8-dynamic branch from d781707 to 387394d Compare April 24, 2026 12:59

Lidang-Jiang changed the title ~~[Bugfix] Support DeepSeek V3.2 W8A8 INT8 service~~ [Feature] Add DeepSeek V3.2 W8A8 INT8 model support Apr 24, 2026

Lidang-Jiang force-pushed the feat/deepseek-v32-w8a8-int8-dynamic branch from 387394d to 3dffb27 Compare April 24, 2026 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add DeepSeek V3.2 W8A8 INT8 model support#339

[Feature] Add DeepSeek V3.2 W8A8 INT8 model support#339
Lidang-Jiang wants to merge 3 commits into
baidu:v0.19.0-devfrom
Lidang-Jiang:feat/deepseek-v32-w8a8-int8-dynamic

Lidang-Jiang commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Lidang-Jiang commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lidang-Jiang commented Apr 24, 2026 •

edited

Loading