Update GB300 FP4 GLM5 low-latency sweep by weireweire · Pull Request #175 · NVIDIA/srt-slurm

weireweire · 2026-05-25T03:32:49Z

Updates the GB300 FP4 GLM5 8k1k low-latency sweep to add a fifth decode point and align max-running-requests, CUDA graph batch sizes, and SA-Bench concurrencies.

Validation:

Parsed recipes/gb300-fp4/glm5.yaml with Ruby YAML parser.

codecov-commenter · 2026-05-25T03:34:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@3523798). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #175   +/-   ##
=======================================
  Coverage        ?   65.10%           
=======================================
  Files           ?       67           
  Lines           ?     8217           
  Branches        ?        0           
=======================================
  Hits            ?     5350           
  Misses          ?     2867           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…pology (#1583) * glm5-fp4-gb300-dynamo-sglang: extend 8k1k low-lat sweep with 1p17d topology Mirrors NVIDIA/srt-slurm#175: adds a 5th 8k1k_stp_lowlat_4 recipe with decode_nodes/workers=17, and lowers per-zip-index decode max-running-requests / cuda-graph-max-bs from a flat 4096 to 128/64/32/16/1 across lowlat_0..4. Benchmark concurrencies follow suit: 128/64/32/16/12. nvidia-master.yaml conc-list updated to match for each of the five 1p{3,5,9,15,17}d entries. * perf-changelog: set PR link to #1583 --------- Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

weireweire marked this pull request as ready for review May 25, 2026 03:35

weireweire requested review from hjjq, kedarpotdar-nv, kyleliang-nv and qiching as code owners May 25, 2026 03:35

Update GB300 FP4 GLM5 low-latency sweep

41d73ff

weireweire force-pushed the gb300-fp4-glm5-lowlat-concurrency branch from 80952de to 41d73ff Compare May 25, 2026 03:36

Ankur-singh mentioned this pull request May 28, 2026

glm5-fp4-gb300-dynamo-sglang: extend 8k1k low-lat sweep with 1p17d topology SemiAnalysisAI/InferenceX#1583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update GB300 FP4 GLM5 low-latency sweep#175

Update GB300 FP4 GLM5 low-latency sweep#175
weireweire wants to merge 1 commit into
NVIDIA:mainfrom
weireweire:gb300-fp4-glm5-lowlat-concurrency

weireweire commented May 25, 2026

Uh oh!

codecov-commenter commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weireweire commented May 25, 2026

Uh oh!

codecov-commenter commented May 25, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants