Add DSV4 GB300 1k1k STP disagg configs#1530
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26176031180 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26176611225 |
2 similar comments
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26176611225 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26176611225 |
dbef1ac to
4f923af
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26302711476 |
|
|
||
| name: "disagg-2p1d-dep12-conc18432" | ||
| slurm: | ||
| time_limit: 03:00:00 |
There was a problem hiding this comment.
Unquoted Slurm time limit
Medium Severity
slurm.time_limit is set as an unquoted 03:00:00, while every other new dynamo recipe in this change uses a quoted "03:00:00". YAML 1.1 parsers often coerce colon-separated values to sexagesimal integers (e.g. 10800), not an HH:MM:SS string, which can yield the wrong Slurm wall-clock limit or type errors downstream.
Reviewed by Cursor Bugbot for commit 1aaa3cc. Configure here.
Port 9 non-MTP disagg configs from NVIDIA/srt-slurm#161: - 1p1d dep8/dep16, 1p4d, 1p6d, 2p1d dep12/dep16/dep48 - low-latency dep4/tp4 with zip overrides
1aaa3cc to
e208645
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26974607644 |
…, switch to megamoe Migrate the 7 STP disagg recipes to the megamoe MoE backend (deepep -> megamoe, drop deepep-config) and strip obsolete SGLANG_OPT_*/SGLANG_DEEPEP env vars now defaulted upstream, mirroring the b300 migration (#1506). Clean the 5 dynamo recipes: fix container to dsv4-grace-blackwell, remove personal extra_mount and hardcoded nodelist pins so they run on CI.
e208645 to
db553d8
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit db553d8. Configure here.
| prefill_workers: 1 | ||
| decode_nodes: 4 | ||
| decode_workers: 4 | ||
|
|
There was a problem hiding this comment.
Missing GB300 sbatch directives
Medium Severity
The 1p4d and 1p6d recipes wired into nvidia-master.yaml omit sbatch_directives (cpus-per-task: 144, mem: 0) that other GB300 multinode DSV4 recipes include. On gb300-cw, Slurm may default to one CPU per task and tight memory, risking slow or failed runs.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit db553d8. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26974737242 |


Summary
Port 9 non-MTP disagg configs from NVIDIA/srt-slurm#161:
Reference: NVIDIA/srt-slurm#161
Note
Low Risk
Adds benchmark/CI YAML and Slurm recipes only; no runtime application or auth changes.
Overview
Adds DeepSeek-V4-Pro GB300 1k/1k STP (non-MTP) disaggregated benchmark coverage ported from NVIDIA/srt-slurm#161: new Slurm recipes under
benchmarks/multi_node/srt-slurm-recipes/sglang/deepseek-v4/1k1k/and a matching CI entrydsv4-fp4-gb300-dynamo-sglang-1k1k-stpinnvidia-master.yaml.High-concurrency Dynamo paths (
1p1ddep8/dep16,2p1ddep12/dep16/dep48) use Dynamo KV routing, megamoe MoE, Mooncake disagg, and custombenchmark_serving.pysweeps at very high concurrency (e.g. 8k–18k). Multi-decode STP paths (1p4d,1p6d) use an SGLang frontend, DEP4 prefill + TP4 decode workers, and sa-bench at8x64/32x64. Low-latency base recipes (disagg-low-latency-dep4.yaml,disagg-low-latency-tp4.yaml) add zip overrides over decode scale-out; they are included in the repo but not wired in the new master block in this diff.Stack updates reflected in recipes/changelog: SGLang image
nightly-dev-cu13-20260602-98a1b58c, deepep → megamoe on relevant roles, trimmed obsoleteSGLANG_OPT_*/ deepep env, and normalized Dynamo model container todsv4-grace-blackwell.perf-changelog.yamldocuments the new config key and those recipe tweaks.Reviewed by Cursor Bugbot for commit db553d8. Bugbot is set up for automated code reviews on this repo. Configure here.