Commit ade5488
committed
fix(launch_gb300-cw): pin srt-slurm fork with parallel sa-bench
The current sa-bench in NVIDIA/srt-slurm@9d75f82 generates random
prompts single-threaded, which dominates 7p1d/conc=8192 bench startup
(~50 min just for the 81920-prompt main pass before the first HTTP
request reaches dynamo). Pin to fzyzcjy/srt-slurm fork branch
`feat/random-num-workers` (commit 8094cfb), which is 9d75f82 + the
SemiAnalysisAI/InferenceX `utils/bench_serving/` benchmark_serving.py
ported into sa-bench. With `--random-num-workers 48` (now the default
in bench.sh) prompt generation drops to ~1 min on a 144-core GB300
host, putting the bench-startup cost on the same order as
infra+model-load instead of dominating it.
The fork is paired with the upstream PR
NVIDIA/srt-slurm#114; once that merges, this
pin should revert to the bumped NVIDIA/srt-slurm SHA.1 parent 16113f8 commit ade5488
1 file changed
Lines changed: 13 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
37 | 48 | | |
38 | 49 | | |
39 | 50 | | |
| |||
90 | 101 | | |
91 | 102 | | |
92 | 103 | | |
93 | | - | |
| 104 | + | |
94 | 105 | | |
95 | 106 | | |
96 | 107 | | |
| |||
0 commit comments