Scripts for vllm-model-bash efforts
Each benchmark run:
- Launches a vLLM server based on the YAML config
- Runs concurrency sweeps and collects latency/throughput metrics
- Optionally profiles GPU activity via Nsight Systems (nsys), and/or PyTorch Profiler
- Produces organized outputs under a specified directory
Ideal for performance characterization, MLPerf inference testing, and multi-level GPU profiling at scale.
pip install -r requirements.txtPython script (recommended for scenario-based configs):
# Run all scenarios
python vllm_bench.py configs/models/gpt-oss-20b.yaml
# Run specific scenario(s)
python vllm_bench.py configs/models/gpt-oss-20b.yaml --scenario baseline
python vllm_bench.py configs/models/gpt-oss-20b.yaml --scenarios baseline,async_schedulingBash script (legacy support):
# Works with both old and new config formats
bash vllm_bench.sh config.yaml
bash vllm_bench.sh configs/models/gpt-oss-20b.yaml --scenario baselineInstall these packages:
sudo apt-get install jq curl -y
pip install yqFor GPU profiling capabilities:
- Nsight Systems: System-wide performance analysis, CUDA graph tracing
- Nsight Compute: Detailed kernel-level analysis
- PyTorch Profiler: Python/PyTorch-level CPU and GPU profiling with memory tracking