Skip to content

recipes: add DeepSeek-V4 GB200 decode/prefill bench yamls#124

Open
esmeetu wants to merge 3 commits intoNVIDIA:mainfrom
esmeetu:deepseek-v4-gb200-recipes
Open

recipes: add DeepSeek-V4 GB200 decode/prefill bench yamls#124
esmeetu wants to merge 3 commits intoNVIDIA:mainfrom
esmeetu:deepseek-v4-gb200-recipes

Conversation

@esmeetu
Copy link
Copy Markdown

@esmeetu esmeetu commented Apr 30, 2026

Summary

  • Adds decode-bench-gb200-dep8.yaml (decode-only DEP=8, megamoe, concurrency sweep 1024/2048/3072).
  • Adds decode-bench-gb200-tep8.yaml (decode-only TEP=8, megamoe, concurrency sweep 128/256/512).
  • Adds prefill-bench-gb200-dep8.yaml (prefill-only DEP8 flashinfer_nvlink_one_sided).

All three live under recipes/vllm/deepseek-v4-pro/GB200/8k1k/, alongside the existing disagg recipes.

Test plan

  • srtctl dry-run -f recipes/vllm/deepseek-v4-pro/GB200/8k1k/decode-bench-gb200-dep8.yaml
  • srtctl dry-run -f recipes/vllm/deepseek-v4-pro/GB200/8k1k/decode-bench-gb200-tep8.yaml
  • srtctl dry-run -f recipes/vllm/deepseek-v4-pro/GB200/8k1k/prefill-bench-gb200-dep8.yaml
  • Run end-to-end on a 2-node GB200 slice for each recipe.

Adds three new benchmark recipes under recipes/vllm/deepseek-v4-pro/GB200/8k1k/:
- decode-bench-gb200-dep8.yaml: decode-only DEP=8 with DecodeBenchConnector
- decode-bench-gb200-tep8.yaml: decode-only TEP=8 sweep
- prefill-bench-gb200-dep8.yaml: prefill-only DEP=8
@ywang96
Copy link
Copy Markdown

ywang96 commented Apr 30, 2026

FYI @alec-flowers - not meant to be merged but these are rather some data points on prefill setup & decode setups @ certain concurrencies.

esmeetu and others added 2 commits April 30, 2026 14:34
…lization

- max-num-seqs: 400 -> 384
- max-cudagraph-capture-size: 400 -> 384
- gpu-memory-utilization: 0.98 -> 0.96
Copy link
Copy Markdown
Collaborator

@qiching qiching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants