feat: SGLang decode slow_down for PD disagg nsys profiling (with skip-warmup workflow) by zhengd-nv · Pull Request #60 · NVIDIA/srt-slurm

zhengd-nv · 2026-04-23T10:09:37Z

PR description

Summary

Wires SGLang’s /slow_down on decode worker leaders from job YAML so that, in PD disaggregated runs, the first decode forwards can be stretched in time while prefill catches up and decode batching builds. This is intended to line up nsys decode step windows with a saturated decode phase. This workflow is only applicable for sglang frontend (sglang-router).

slow_down is designed to be used together with SA-Bench warmup disabled (num_warmup_mult: 0). The built-in benchmark warmup is skipped so decode step indices stay predictable. The usual “warmup” role is instead covered by a number of real decode forwards after slow_down auto-clears and before the nsys capture window—those steps bring decode (e.g. cuda graphs, batching) to a steady state before profiling.

Mapping `profiling.decode.start_step` (example recipe)

For the example workload, decode nsys capture is started at a step chosen so the window begins after:

Bootstrap — here modeled as osl steps (1024 for osl: 1024);
slow_down window — a small number of forwards while /slow_down is active (e.g. 4 steps in the example, tied to your slow_down_* timing);
Post-slow_down warmup — additional forwards after slow-down clears (e.g. 72 steps) so decode is “hot” before nsys.

So in the example:

decode.start_step (1100) = bootstrap_steps (1024, = osl) + slow_down_steps (4) + warmup_steps (72).

Tweak the three terms if you change osl, concurrency, or slow_down_*, and set profiling.decode.start_step / stop_step accordingly.

User-facing changes

YAML: benchmark.slow_down_sleep_time + slow_down_wait_time (both set, SGLang frontend) → srtctl passes decode leader URLs to SA-Bench; see benchmark_stage / bench.sh / benchmark_serving.py.
bench.sh: optional skip warmup when NUM_WARMUP_MULT is 0.
Example: recipes/.../1p1d-dep4-nsys-profile-slowdown.yaml documents the skip-warmup + slow_down + step budget for nsys in the file header and next to decode.start_step / num_warmup_mult.

Made-with: Cursor

codecov-commenter · 2026-04-23T10:11:00Z

Codecov Report

❌ Patch coverage is 15.38462% with 22 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@698590e). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/srtctl/cli/mixins/benchmark_stage.py	8.33%	22 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #60   +/-   ##
=======================================
  Coverage        ?   70.38%           
=======================================
  Files           ?       60           
  Lines           ?     6571           
  Branches        ?        0           
=======================================
  Hits            ?     4625           
  Misses          ?     1946           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Made-with: Cursor

# Conflicts: # src/srtctl/benchmarks/scripts/sa-bench/bench.sh # src/srtctl/cli/mixins/benchmark_stage.py

zhengd-nv added 4 commits April 23, 2026 02:23

slow down support

8888132

skip warmup

60e830d

add example recipe for nsys profile with slowdown

f742f55

Made-with: Cursor

update example description

3daa326

zhengd-nv requested review from alec-flowers, csahithi, hjjq, ishandhanani, kedarpotdar-nv, kyleliang-nv, nlevin-ui and qiching as code owners April 23, 2026 10:09

zhengd-nv added 6 commits April 23, 2026 03:49

update example

aeaef24

update config

0f5c216

Merge origin/main into slow-down (resolve sa-bench conflicts)

69be6ad

Made-with: Cursor

Merge remote-tracking branch 'origin/main' into slow-down

bbb42f1

# Conflicts: # src/srtctl/benchmarks/scripts/sa-bench/bench.sh # src/srtctl/cli/mixins/benchmark_stage.py

ruff format

6520e2d

Merge remote-tracking branch 'origin/main' into slow-down

35af0dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SGLang decode slow_down for PD disagg nsys profiling (with skip-warmup workflow)#60

feat: SGLang decode slow_down for PD disagg nsys profiling (with skip-warmup workflow)#60
zhengd-nv wants to merge 10 commits intoNVIDIA:mainfrom
zhengd-nv:slow-down

zhengd-nv commented Apr 23, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhengd-nv commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description

Summary

Mapping profiling.decode.start_step (example recipe)

User-facing changes

Uh oh!

codecov-commenter commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhengd-nv commented Apr 23, 2026 •

edited

Loading

Mapping `profiling.decode.start_step` (example recipe)

codecov-commenter commented Apr 23, 2026 •

edited

Loading