Skip to content

Commit ade5488

Browse files
committed
fix(launch_gb300-cw): pin srt-slurm fork with parallel sa-bench
The current sa-bench in NVIDIA/srt-slurm@9d75f82 generates random prompts single-threaded, which dominates 7p1d/conc=8192 bench startup (~50 min just for the 81920-prompt main pass before the first HTTP request reaches dynamo). Pin to fzyzcjy/srt-slurm fork branch `feat/random-num-workers` (commit 8094cfb), which is 9d75f82 + the SemiAnalysisAI/InferenceX `utils/bench_serving/` benchmark_serving.py ported into sa-bench. With `--random-num-workers 48` (now the default in bench.sh) prompt generation drops to ~1 min on a 144-core GB300 host, putting the bench-startup cost on the same order as infra+model-load instead of dominating it. The fork is paired with the upstream PR NVIDIA/srt-slurm#114; once that merges, this pin should revert to the bumped NVIDIA/srt-slurm SHA.
1 parent 16113f8 commit ade5488

1 file changed

Lines changed: 13 additions & 2 deletions

File tree

runners/launch_gb300-cw.sh

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,18 @@ export NVIDIA_VISIBLE_DEVICES=all
3333
export NVIDIA_DRIVER_CAPABILITIES=compute,utility
3434

3535
NGINX_IMAGE="nginx:1.27.4"
36-
SRT_SLURM_RECIPES_COMMIT="9d75f82acec163594658a440f39dd7f1bd35bd16"
36+
# Pin to fzyzcjy/srt-slurm fork branch `feat/random-num-workers`
37+
# (= NVIDIA/srt-slurm@9d75f82 + sa-bench parallel random prompt
38+
# generation). The single-threaded random prompt generator in the
39+
# upstream sa-bench dominates bench startup on the 7p1d/conc=8192
40+
# sweep (~50 min for the main pass alone before the first HTTP
41+
# request leaves the client). The fork bumps that to ~1 min via
42+
# multiprocessing.Pool with `--random-num-workers 48`.
43+
#
44+
# TODO: revert to a NVIDIA/srt-slurm pin once the upstream PR
45+
# (https://github.com/NVIDIA/srt-slurm/pull/114) merges.
46+
SRT_SLURM_RECIPES_REPO="https://github.com/fzyzcjy/srt-slurm.git"
47+
SRT_SLURM_RECIPES_COMMIT="8094cfb1db7cad76fbf9ecb41c0c7e662dad301e"
3748

3849
# Squash files live alongside models on /mnt/vast (shared across nodes).
3950
# `squash_dupe` instead of `squash` to use '_'-separated names: srtctl /
@@ -90,7 +101,7 @@ if [ -d "$SRT_REPO_DIR" ]; then
90101
rm -rf "$SRT_REPO_DIR"
91102
fi
92103

93-
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
104+
git clone "$SRT_SLURM_RECIPES_REPO" "$SRT_REPO_DIR"
94105
cd "$SRT_REPO_DIR"
95106
git checkout "$SRT_SLURM_RECIPES_COMMIT"
96107

0 commit comments

Comments
 (0)