Skip to content

[Feature] Add DeepSeek V3.2 W8A8 INT8 model support#339

Open
Lidang-Jiang wants to merge 3 commits into
baidu:v0.19.0-devfrom
Lidang-Jiang:feat/deepseek-v32-w8a8-int8-dynamic
Open

[Feature] Add DeepSeek V3.2 W8A8 INT8 model support#339
Lidang-Jiang wants to merge 3 commits into
baidu:v0.19.0-devfrom
Lidang-Jiang:feat/deepseek-v32-w8a8-int8-dynamic

Conversation

@Lidang-Jiang

@Lidang-Jiang Lidang-Jiang commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Support serving DeepSeek-V3.2-W8A8-INT8-Dynamic on the vLLM 0.19 Kunlun path.
  • Add compatibility/fallback paths for dense FlashMLA, MLA prefill/decode, and compressed-tensors W8A8 dynamic INT8 matmul/quantization.
  • Add regression coverage for the adapted MLA and quantization paths.

This PR is based on the v0.19 upgrade branch from #315; the new commit adds the DeepSeek V3.2 W8A8 INT8 Dynamic serving adaptation.

Full sanitized logs: https://gist.github.com/Lidang-Jiang/14df1d68a4f9cd17f18a63950cd007ad

Before

Baseline commit: 864d569

Service startup failed before the adaptation:

(APIServer pid=165706)   Value error, This model does not support `--runner generate`. [type=value_error, input_value=ArgsKwargs((), {'model': ...nderer_num_workers': 1}), input_type=ArgsKwargs]

Client verification:

mode=before
code=864d569
service_log=/workspace/deepseek_v32_before_864d569.log
curl /v1/models:
curl: (7) Failed to connect to 127.0.0.1 port 8566: Connection refused

curl /v1/chat/completions:
curl: (7) Failed to connect to 127.0.0.1 port 8566: Connection refused
After

Service startup:

(Worker_TP0 pid=166770) INFO 04-24 20:47:36 [default_loader.py:384] Loading weights took 106.27 seconds
(Worker_TP0 pid=166770) INFO 04-24 20:47:37 [gpu_model_runner.py:4820] Model loading took 81.07 GiB memory and 107.374742 seconds
(EngineCore pid=166479) INFO 04-24 20:48:46 [kv_cache_utils.py:1319] GPU KV cache size: 44,352 tokens
(EngineCore pid=166479) INFO 04-24 20:48:46 [kv_cache_utils.py:1324] Maximum concurrency for 32,768 tokens per request: 1.35x
(APIServer pid=166227) INFO:     Application startup complete.

Client verification:

mode=after
commit=3dffb27
service_log=/workspace/deepseek_v32_w8a8_p8566.log
curl /health:
200
curl /v1/models:
{"object":"list","data":[{"id":"DeepSeek-V3.2-W8A8-INT8-Dynamic","object":"model","owned_by":"vllm","root":"/models/DeepSeek-V3.2-W8A8-INT8-Dynamic","max_model_len":32768}]}

curl /v1/chat/completions:
HTTP 200, assistant content: 我是DeepSeek,由深度求索公司创造的AI助手!

Unit tests:

============================= test session starts ==============================
platform linux -- Python 3.10.15, pytest-8.3.3, pluggy-1.5.0
rootdir: /workspace/vLLM-Kunlun
configfile: pyproject.toml
plugins: typeguard-4.4.1, repeat-0.9.3, anyio-4.6.2.post1, timeout-2.3.1, metadata-3.1.1, html-3.2.0, cov-7.0.0
collected 59 items

tests/ut/test.py ....................................................... [ 93%]
....                                                                     [100%]

============================== 59 passed in 6.91s ==============================

Test plan

  • /workspace/python310_torch25_cuda/bin/python -m pytest tests/ut/test.py
  • Start service with DeepSeek-V3.2-W8A8-INT8-Dynamic
  • curl http://127.0.0.1:8566/health
  • curl http://127.0.0.1:8566/v1/models
  • curl http://127.0.0.1:8566/v1/chat/completions

- align package metadata, docs, and CI with vllm 0.19.0
- add 0.19.x compatibility shims and request-path fixes
- add unit coverage for the new compatibility paths

Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>
Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>
@Lidang-Jiang Lidang-Jiang force-pushed the feat/deepseek-v32-w8a8-int8-dynamic branch from d781707 to 387394d Compare April 24, 2026 12:59
Enable the DeepSeek V3.2 W8A8 INT8 Dynamic model to start and serve chat on the vLLM 0.19 path.

Add compressed-tensors W8A8 fallbacks, dense FlashMLA routing, MLA prefill/decode fallbacks, and regression tests for the adapted paths.

Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>
@Lidang-Jiang Lidang-Jiang changed the title [Bugfix] Support DeepSeek V3.2 W8A8 INT8 service [Feature] Add DeepSeek V3.2 W8A8 INT8 model support Apr 24, 2026
@Lidang-Jiang Lidang-Jiang force-pushed the feat/deepseek-v32-w8a8-int8-dynamic branch from 387394d to 3dffb27 Compare April 24, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant