Skip to content

Add DSV4-Pro GB300 8k1k non-MTP disagg recipes#174

Open
yhyang201 wants to merge 1 commit into
NVIDIA:mainfrom
yhyang201:dsv4-gb300-8k1k-disagg-non-mtp
Open

Add DSV4-Pro GB300 8k1k non-MTP disagg recipes#174
yhyang201 wants to merge 1 commit into
NVIDIA:mainfrom
yhyang201:dsv4-gb300-8k1k-disagg-non-mtp

Conversation

@yhyang201
Copy link
Copy Markdown

Add 6 disaggregated serving recipes for DeepSeek-V4-Pro on GB300 FP4 with 8k/1k workload,
ported from InferenceX (PR #1528).

Configs:

  • 1p1d-tp4-tp4: low latency (c=1, 2 nodes, flashinfer_mxfp4)
  • 1p1d-dep4-dep16: 5 nodes, c=1024, megamoe
  • 4p1d-dep4-dep16: 8 nodes, c=1024, megamoe
  • 8p1d-dep4-dep16: 12 nodes, c=4096, megamoe
  • 10p1d-dep4-dep16: 14 nodes, c=8192, megamoe
  • 12p1d-dep4-dep12: 15 nodes, c=21504, megamoe

All use dynamo frontend, mooncake KV transfer.
Image: sglang nightly-dev-cu13-20260520-425dffbd.

Add 6 disaggregated serving recipes for DeepSeek-V4-Pro on GB300 FP4
with 8k/1k workload, ported from InferenceX PR #1528.

Configs:
- 1p1d-tp4-tp4: low latency (c=1, flashinfer_mxfp4)
- 1p1d-dep4-dep16: 5 nodes, c=1024
- 4p1d-dep4-dep16: 8 nodes, c=1024
- 8p1d-dep4-dep16: 12 nodes, c=4096
- 10p1d-dep4-dep16: 14 nodes, c=8192
- 12p1d-dep4-dep12: 15 nodes, c=21504

All use dynamo frontend, mooncake KV transfer, megamoe MoE backend
(except tp4-tp4 which uses flashinfer_mxfp4).
Image: sglang nightly-dev-cu13-20260520-425dffbd.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant