Skip to content

Add vLLM DeepSeek-R1 128K/8K recipes for GB300 NVL72#165

Draft
qiching wants to merge 1 commit into
NVIDIA:mainfrom
qiching:gb300-dsr1-128k8k-recipes
Draft

Add vLLM DeepSeek-R1 128K/8K recipes for GB300 NVL72#165
qiching wants to merge 1 commit into
NVIDIA:mainfrom
qiching:gb300-dsr1-128k8k-recipes

Conversation

@qiching
Copy link
Copy Markdown
Collaborator

@qiching qiching commented May 19, 2026

Draft recipes for DeepSeek-R1 long-context disaggregated serving on GB300 NVL72 with vLLM. Covers:

  • Canonical DEP8 disagg config (4 prefill / 2 decode nodes)
  • Wide-EP scaling: DEP16 / DEP32 with flashinfer_nvlink_one_sided A2A
  • MTP(speculative decoding)
  • Rate-matching ablations (prefill-only / decode-only / full-disagg)
  • Prefill experiments (DP4+EP4 vs PP4, chunked prefill 4K/12K/32K)
  • LongBench v2 accuracy recipe

Related upstream work:

  • vLLM PR #39841 (FP8 cast-order fix for chunked prefill)
  • FlashInfer PR #3129 (FP8 E4M3/E5M2 in concat_mla_k)
  • vLLM issues #41687 (DeepEP), #41685 (long-context PP), #41682 (PP scheduler), #40674 (NixlConnector + PP)

Draft recipes for DeepSeek-R1 / V3 (671B MoE) long-context disaggregated
serving on GB300 NVL72 with vLLM. Covers:

  - Canonical DEP8 disagg config (4 prefill / 2 decode nodes)
  - Wide-EP scaling: DEP16 / DEP32 with flashinfer_nvlink_one_sided A2A
  - MTP variant (speculative decoding)
  - Rate-matching ablations (prefill-only / decode-only / full-disagg)
  - Prefill experiments (DP4+EP4 vs PP4, chunked prefill 4K/12K/32K)
  - LongBench v2 accuracy recipe

Related upstream work:
  - vLLM PR #39841 (FP8 cast-order fix for chunked prefill)
  - FlashInfer PR #3129 (FP8 E4M3/E5M2 in concat_mla_k)
  - vLLM issues #41687 (DeepEP), #41685 (long-context PP),
    #41682 (PP scheduler), #40674 (NixlConnector + PP)
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@84a0c10). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #165   +/-   ##
=======================================
  Coverage        ?   65.08%           
=======================================
  Files           ?       67           
  Lines           ?     8217           
  Branches        ?        0           
=======================================
  Hits            ?     5348           
  Misses          ?     2869           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants