This workflow provides a way to run setup/run.sh benchmark experiments and analysis logic.
It works with both standard Kubernetes clusters and OpenShift, and assumes the llm-d and/or vLLM stack is already deployed.
The Kubernetes workflow consists of these main components:
benchmark-job.yaml- Runs the benchmark experiment portion ofsetup/run.sh,fmperf-llm-d-benchmark.pyanalysis-job.yaml- Runs the analysis portion ofsetup/run.sh,fmperf-analyze_results.py
This workflow is a simplified Kubernetes-native version of setup/run.sh, which provides command-line options for model selection,
scenario configuration, and benchmark execution. It is meant to run benchmark experiments on already-existing llm-d and/or vLLM deployments.
- Kubernetes & OpenShift Compatible: Uses
runAsNonRoot, drops all capabilities, restricted security contexts - Main Project Integration: Uses the same scripts, profiles, and scenarios as
setup/run.sh - Pure Kubernetes: Everything runs in jobs, no local execution required
- Analysis Integration: Includes the same analysis capabilities as the main project
- Kubernetes/OpenShift cluster with
kubectlconfigured - llm-d stack and/or standalone vLLM already deployed and accessible
- Required Kubernetes resources (see Setup section)
kubectl create namespace llm-d-benchmark
# Update resources/benchmark-env.yaml with your cluster configuration
# Update resources/benchmark-workload-configmap.yaml with your workload settings
kubectl apply -k resources/
# You will now have the PVC, configmaps, and RBAC necessary to proceedBefore running the workflow, customize the configuration files to match your deployment and benchmarking requirements.
📖 For detailed configuration instructions, see: Configuration Guide
Before proceeding, ensure you have:
- ✅ Updated
resources/benchmark-env.yamlwith your endpoint URL and stack type - ✅ Updated
resources/benchmark-workload-configmap.yamlwith your model name and scenarios - ✅ Created the HuggingFace token secret (if using gated models)
kubectl apply -f benchmark-job.yaml# Follow the logs
kubectl logs -f job/benchmark-run -n llm-d-benchmark
# Check job status
kubectl get job benchmark-run -n llm-d-benchmarkWait for the experiment job to complete successfully, then run the analysis:
# Verify experiment completed successfully
kubectl get job benchmark-run -n llm-d-benchmark
# Run analysis job
kubectl apply -f analysis-job.yaml# Follow the analysis logs
kubectl logs -f job/benchmark-analysis -n llm-d-benchmarkThe analysis job will generate plots and statistics in the shared PVC. You can access them using the provided retrieve script:
kubectl apply -f retrieve.yaml
kubectl logs job/retrieve-results -n llm-d-benchmarkOr copy the results directly to your local system:
# Create local directory for results
mkdir -p ./benchmark-results
# Copy results from PVC to local system using the retrieve pod
kubectl cp llm-d-benchmark/results-retriever:/requests ./benchmark-results/
# Clean up retriever pod
kubectl delete pod results-retriever -n llm-d-benchmarkAfter successful completion, you'll find:
In the PVC (and locally in ./benchmark-results/ after kubectl cp):
- Raw benchmark data:
- PVC:
/requests/<stack-name>/ - Local:
./benchmark-results/<stack-name>/ - Contains: CSV files with benchmark results
- PVC:
- Analysis plots:
- PVC:
/requests/analysis/plots/ - Local:
./benchmark-results/analysis/plots/ - Contains: PNG files with visualizations
latency_analysis.png- Latency metrics across QPS levelsthroughput_analysis.png- Throughput and token count analysisREADME.md- Description of the plots
- PVC:
- Statistics:
- PVC:
/requests/analysis/data/stats.txt - Local:
./benchmark-results/analysis/data/stats.txt - Contains: Summary statistics
- PVC:
The generated README.md in the plots directory contains embedded images showing the analysis visualizations. To view it properly with the plots displayed:
# Create a virtual environment and install grip
python -m venv venv && source venv/bin/activate && pip install grip
# View the analysis README with plots in your browser
grip benchmark-results/analysis/plots/README.md --browserThis will open a rendered view of the analysis documentation with all plots displayed inline.
# Delete jobs
kubectl delete job benchmark-run benchmark-analysis -n llm-d-benchmark
# Delete all resources (optional)
kubectl delete namespace llm-d-benchmark