recipes(qwen3.5): refresh fp8 mtp-off wideep configs by zhengd-nv · Pull Request #149 · NVIDIA/srt-slurm

zhengd-nv · 2026-05-12T01:47:55Z

Summary

Refresh of the Qwen3.5 FP8 mtp-off wideep recipes introduced by #128 driven by decode-side performance
sweeps on GB200.

Changes per recipe

setup_script: switched from rebuild-deepep.sh to setup-router-and-deepep.sh. Even though sglang-router is not actually used by these recipes, this setup script additionally seeds the decode SGLANG_DG_CACHE_DIR (/tmp/deepgemm-cache, node-local) from the shared /configs cache; the underlying deepep-rebuild step is the same.
identity: declares the sglang repo, container image, and framework version actually exercised (lmsysorg/sglang:0.5.10.post1), so jobs that apply these recipes can verify they're running the intended version.
name: prefixed with wideep- so the runtime job name matches the file name (e.g. qwen3.5-wideep-1p1d-dep8-dpep-ccsweep).
Drop deprecated decode option prefill-round-robin-balance: true.
DEP16+: lower decode mem-fraction-static from 0.80 to 0.75. DeepEP buffers scale linearly with ep_size; at 0.80 CUDA graph capture OOMs inside cuda_graph_runner.capture_one_batch_size for DEP16/DEP32.
Add SGLANG_HEALTH_STARTING_OK / SGLANG_ENABLE_HEALTH_ENDPOINT_GENERATION to both prefill and decode env; align decode SGLANG_DG_CACHE_DIR and SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK.
Extend 1P ccsweep ranges so both the prefill-unsaturated and prefill-saturated regimes are sampled (DEP16 → cc=4096, DEP32 → cc=8192).

New recipe

wideep-3p1d-dep16-dpep-cc4096.yaml: mirrors wideep-3p1d-dep32-dpep-cc4096.yaml but for DEP16. At cc=4096 (per-DP=256), 1P and 2P prefill remain saturated; only 3P unlocks the decode at full per-DP capacity, and this point produces the highest realized out/gpu of all tested configurations.

Caveat

6p1d-dep32-cc8192 (and any other multi-prefill recipe that fans out past ~5 workers) tends to hit zmq.error.ZMQError: Address already in use during sglang engine init when run on plain main. #134 (port jitter / odd-port allocation) resolves this; on plain main, multiple resubmissions may be required for the 6P+ configurations to start cleanly.

Test plan

srtctl dry-run -f <recipe> for all 8 recipes (node counts + identity fields verified)
srtctl apply -f wideep-1p1d-dep8-dpep-ccsweep.yaml: completes the full cc sweep (8 → 2048); output throughput matches the prior locally-generated config within <1 %
srtctl apply -f wideep-6p1d-dep32-dpep-cc8192.yaml (on a branch carrying Sglang port jitter #134): initializes cleanly (no ZMQError) and completes the cc=8192 point

codecov-commenter · 2026-05-12T01:50:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@3183b4c). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #149   +/-   ##
=======================================
  Coverage        ?   65.07%           
=======================================
  Files           ?       67           
  Lines           ?     8214           
  Branches        ?        0           
=======================================
  Hits            ?     5345           
  Misses          ?     2869           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Minimal refresh of the wideep recipes from NVIDIA#128 (d571e42) driven by recent decode-side sweeps on GB200. Each existing recipe gets exactly the same four-line set of functional changes: - name: drop the `-router` suffix and add a `wideep-` prefix so the runtime job name matches the file name. - setup_script: rebuild-deepep.sh -> setup-router-and-deepep.sh. The new script additionally seeds the decode-side SGLANG_DG_CACHE_DIR (/tmp/deepgemm-cache, node-local) from the shared /configs cache; the deepep-rebuild step itself is unchanged. sglang-router is not actually exercised by these recipes. - identity: declare the sglang repo + container image + framework version actually exercised (lmsysorg/sglang:0.5.10.post1), so jobs that apply these recipes can verify they are running the intended version. - Drop deprecated decode option `prefill-round-robin-balance: true`. wideep-1p1d-dep8-dpep-ccsweep additionally aligns its decode SGLANG_DG_CACHE_DIR with the other recipes (/configs -> /tmp), since the new setup_script seeds the node-local path. New file: - wideep-3p1d-dep16-dpep-cc4096.yaml — mirrors wideep-3p1d-dep32-dpep-cc4096.yaml but targets DEP16. At cc=4096 (per-DP=256), 1P and 2P prefill remain saturated; only 3P unlocks decode at full per-DP capacity, and this point produces the highest realized out/gpu of all tested configurations. Known caveat: - 6p1d-dep32-cc8192 (and any other recipe that fans out past ~5 prefill workers) is prone to `zmq.error.ZMQError: Address already in use` during sglang engine init. NVIDIA#134 (port jitter / odd-port allocation) resolves this; without it multiple resubmissions may be required for the 6P+ configurations to start cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DEP16 DeepEP buffers scale with ep_size the same way they do for DEP32; 0.80 OOMs during cuda-graph capture on real cc=4096 runs. Other DEP16+ recipes (2p1d-dep16, 3p1d-dep16, 1p1d-dep32, ...) are already at 0.75; this brings the 1p1d-dep16 ccsweep recipe into the same class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

zhengd-nv requested review from hjjq, kedarpotdar-nv, kyleliang-nv and qiching as code owners May 12, 2026 01:47

zhengd-nv force-pushed the qwen3.5-wideep-recipes-refresh branch from 6279c0c to efb282b Compare May 15, 2026 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recipes(qwen3.5): refresh fp8 mtp-off wideep configs#149

recipes(qwen3.5): refresh fp8 mtp-off wideep configs#149
zhengd-nv wants to merge 2 commits into
NVIDIA:mainfrom
zhengd-nv:qwen3.5-wideep-recipes-refresh

zhengd-nv commented May 12, 2026

Uh oh!

codecov-commenter commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhengd-nv commented May 12, 2026

Summary

Changes per recipe

New recipe

Caveat

Test plan

Uh oh!

codecov-commenter commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented May 12, 2026 •

edited

Loading