feat: add vLLM GB200 GSM8K repro configs by alec-flowers · Pull Request #106 · NVIDIA/srt-slurm

alec-flowers · 2026-04-28T06:00:22Z

Summary

Adds a reproducible vLLM DeepSeek-V4-Pro GB200 GSM8K eval path and the two exact 1P1D configs used for the SA smoke comparison.

Adds a self-contained benchmark.type: lm-eval runner that evaluates OpenAI-compatible chat endpoints with EleutherAI lm-eval.
Runs the eval in the same runtime container as the server, matching the InferenceX/srt-slurm multi-node flow.
Vendors the GSM8K task YAML, score threshold, and score validator used by the repro configs.
Adds two exact vLLM GB200 1P1D eval recipes:
- DEP8 prefill + TP8 decode: disagg-gb200-1p1d-dep8-tp8-gsm8k-smoke.yaml
- DEP8 prefill + DEP8 decode: disagg-gb200-1p1d-dep8-dep8-gsm8k-smoke.yaml

SA Validation

Successful runs on the SA GB200 cluster:

TP8 decode repro: job 15558
- exact_match,strict-match: 0.8635
- exact_match,flexible-extract: 0.8597
DEP8 decode comparison: job 15559
- exact_match,strict-match: 0.9636087945413192
- exact_match,flexible-extract: 0.9628506444275967

Both use full GSM8K test split (1319 examples), 5 shots from the task YAML, EVAL_CONC=128, max_length=9472, and max_tokens=5376.

Test Plan

PYTHONPATH=src /home/aflowers/Documents/agent/srt-slurm-gsm8k-worktree/.venv/bin/python -m pytest tests/ -q
- 639 passed, 2 skipped, 6 deselected
/home/aflowers/Documents/agent/srt-slurm-gsm8k-worktree/.venv/bin/ruff check src/srtctl tests/test_benchmarks.py
Parsed both new recipes with load_config; both resolve as benchmark=lm-eval, served model deepseek-ai/DeepSeek-V4-Pro, EVAL_CONC=128.
Ran the repo copyright-check logic locally; all checked files have NVIDIA SPDX headers.

Note: a fresh uv run pytest ... in the new worktree could not resolve jinja2 from the configured package index, so validation used the existing populated srt-slurm dev venv with PYTHONPATH=src pinned to this worktree.

Preflight Note

This PR also keeps srtctl apply from running blocking preflight implicitly. Some GB200 clusters keep model paths on per-compute-node local storage, so login-node preflight can fail even though the job would run correctly once scheduled. Explicit validation remains available via srtctl preflight -f ....

alec-flowers added 2 commits April 27, 2026 22:59

feat: add vLLM GB200 GSM8K repro configs

1cad477

fix: disable blocking apply preflight

0014864

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add vLLM GB200 GSM8K repro configs#106

feat: add vLLM GB200 GSM8K repro configs#106
alec-flowers wants to merge 2 commits intomainfrom
aflowers/vllm-gb200-gsm8k-repro

alec-flowers commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alec-flowers commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

SA Validation

Test Plan

Preflight Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alec-flowers commented Apr 28, 2026 •

edited

Loading