llm-d-benchmark Workflow

This workflow provides a way to run setup/run.sh benchmark experiments and analysis logic. It works with both standard Kubernetes clusters and OpenShift, and assumes the llm-d and/or vLLM stack is already deployed.

Overview

The Kubernetes workflow consists of these main components:

benchmark-job.yaml - Runs the benchmark experiment portion of setup/run.sh, fmperf-llm-d-benchmark.py
analysis-job.yaml - Runs the analysis portion of setup/run.sh, fmperf-analyze_results.py

This workflow is a simplified Kubernetes-native version of setup/run.sh, which provides command-line options for model selection, scenario configuration, and benchmark execution. It is meant to run benchmark experiments on already-existing llm-d and/or vLLM deployments.

Key Features

Kubernetes & OpenShift Compatible: Uses runAsNonRoot, drops all capabilities, restricted security contexts
Main Project Integration: Uses the same scripts, profiles, and scenarios as setup/run.sh
Pure Kubernetes: Everything runs in jobs, no local execution required
Analysis Integration: Includes the same analysis capabilities as the main project

Prerequisites

Kubernetes/OpenShift cluster with kubectl configured
llm-d stack and/or standalone vLLM already deployed and accessible
Required Kubernetes resources (see Setup section)

Setup

Create Namespace and Resources (PVC, RBAC, and configmaps)

kubectl create namespace llm-d-benchmark

# Update resources/benchmark-env.yaml with your cluster configuration
# Update resources/benchmark-workload-configmap.yaml with your workload settings
kubectl apply -k resources/

# You will now have the PVC, configmaps, and RBAC necessary to proceed

Configuration

Before running the workflow, customize the configuration files to match your deployment and benchmarking requirements.

📖 For detailed configuration instructions, see: Configuration Guide

Quick Configuration Checklist

Before proceeding, ensure you have:

✅ Updated resources/benchmark-env.yaml with your endpoint URL and stack type
✅ Updated resources/benchmark-workload-configmap.yaml with your model name and scenarios
✅ Created the HuggingFace token secret (if using gated models)

Running the Workflow

Run the Experiment Job

kubectl apply -f benchmark-job.yaml

Monitor the Experiment

# Follow the logs
kubectl logs -f job/benchmark-run -n llm-d-benchmark

# Check job status
kubectl get job benchmark-run -n llm-d-benchmark

Run Analysis After Experiment Completes

Wait for the experiment job to complete successfully, then run the analysis:

# Verify experiment completed successfully
kubectl get job benchmark-run -n llm-d-benchmark

# Run analysis job
kubectl apply -f analysis-job.yaml

Monitor the Analysis

# Follow the analysis logs
kubectl logs -f job/benchmark-analysis -n llm-d-benchmark

Retrieve Results

The analysis job will generate plots and statistics in the shared PVC. You can access them using the provided retrieve script:

kubectl apply -f retrieve.yaml
kubectl logs job/retrieve-results -n llm-d-benchmark

Or copy the results directly to your local system:

# Create local directory for results
mkdir -p ./benchmark-results

# Copy results from PVC to local system using the retrieve pod
kubectl cp llm-d-benchmark/results-retriever:/requests ./benchmark-results/

# Clean up retriever pod
kubectl delete pod results-retriever -n llm-d-benchmark

Output Files

After successful completion, you'll find:

In the PVC (and locally in ./benchmark-results/ after kubectl cp):

Raw benchmark data:
- PVC: /requests/<stack-name>/
- Local: ./benchmark-results/<stack-name>/
- Contains: CSV files with benchmark results
Analysis plots:
- PVC: /requests/analysis/plots/
- Local: ./benchmark-results/analysis/plots/
- Contains: PNG files with visualizations
  - latency_analysis.png - Latency metrics across QPS levels
  - throughput_analysis.png - Throughput and token count analysis
  - README.md - Description of the plots
Statistics:
- PVC: /requests/analysis/data/stats.txt
- Local: ./benchmark-results/analysis/data/stats.txt
- Contains: Summary statistics

Viewing the Analysis Results

The generated README.md in the plots directory contains embedded images showing the analysis visualizations. To view it properly with the plots displayed:

# Create a virtual environment and install grip
python -m venv venv && source venv/bin/activate && pip install grip

# View the analysis README with plots in your browser
grip benchmark-results/analysis/plots/README.md --browser

This will open a rendered view of the analysis documentation with all plots displayed inline.

Cleanup

# Delete jobs
kubectl delete job benchmark-run benchmark-analysis -n llm-d-benchmark

# Delete all resources (optional)
kubectl delete namespace llm-d-benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-d-benchmark Workflow

Overview

Key Features

Prerequisites

Setup

Create Namespace and Resources (PVC, RBAC, and configmaps)

Configuration

Quick Configuration Checklist

Running the Workflow

Run the Experiment Job

Monitor the Experiment

Run Analysis After Experiment Completes

Monitor the Analysis

Retrieve Results

Output Files

Viewing the Analysis Results

Cleanup

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

llm-d-benchmark Workflow

Overview

Key Features

Prerequisites

Setup

Create Namespace and Resources (PVC, RBAC, and configmaps)

Configuration

Quick Configuration Checklist

Running the Workflow

Run the Experiment Job

Monitor the Experiment

Run Analysis After Experiment Completes

Monitor the Analysis

Retrieve Results

Output Files

Viewing the Analysis Results

Cleanup