Skip to content

rh-aiservices-bu/guidellm-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GuideLLM Benchmark Pipeline & Workbench

GuideLLM Pipeline

A configurable pipeline for running GuideLLM benchmarks against LLM endpoints.

GuideLLM Pipeline

GuideLLM Benchmark Results

Table of Contents

GuideLLM Overview

GuideLLM evaluates and optimizes LLM deployments by simulating real-world inference workloads to assess performance, resource requirements, and cost implications across different hardware configurations.

Key Features

  • Performance & Scalability Testing: Analyze LLM inference under various load scenarios to meet SLOs
  • Resource & Cost Optimization: Determine optimal hardware configurations and deployment strategies
  • Flexible Deployment: Support for Kubernetes Jobs and Tekton Pipelines with configurable parameters
  • Automated Results: Timestamped output directories with comprehensive benchmark results

Usage

Running as Kubernetes Job

# Apply the PVC
kubectl apply -f utils/jobs/pvc.yaml

# Apply the ConfigMap (optional)
kubectl apply -f pipeline/config.yaml

# Run the job with default settings
kubectl apply -f utils/jobs/guidellm-job.yaml

# Or customize environment variables
kubectl set env job/run-guidellm TARGET=http://my-endpoint:8000/v1
kubectl set env job/run-guidellm MODEL_NAME=my-model

Running as Tekton Pipeline

# Apply the task and pipeline
kubectl apply -f pipeline/tekton-task.yaml
kubectl apply -f pipeline/tekton-pipeline.yaml

# Run with parameters
tkn pipeline start guidellm-benchmark-pipeline \
  --param target=http://llama32-3b.llama-serve.svc.cluster.local:8000/v1 \
  --param model-name=llama32-3b \
  --param processor=RedHatAI/Llama-3.2-3B-Instruct-quantized.w8a8 \
  --param data-config='{"type":"emulated","prompt_tokens":512,"output_tokens":128}' \
  --workspace name=shared-workspace,claimName=guidellm-output-pvc

(if you need to install the tkn executable, on a Mac, you will want to run brew install tektoncd-cli)

Once the Tekton pipeline starts, the GuideLLM benchmark CLI will be triggered with the input parameters:

GuideLLM Pipeline

The GuideLLM benchmark will begin running and start simulating real-world inference workloads against the target endpoint:

GuideLLM Pipeline

Configuration Options

Environment Variables

  • TARGET: Model endpoint URL
  • MODEL_NAME: Model identifier
  • PROCESSOR: Processor/model path
  • DATA_CONFIG: JSON data configuration
  • OUTPUT_FILENAME: Output file name
  • RATE_TYPE: Rate type (synchronous/poisson)
  • MAX_SECONDS: Maximum benchmark duration

Results

The benchmark generates comprehensive performance metrics and visualizations:

GuideLLM Pipeline

The results provide detailed insights into throughput, latency, resource utilization, and other key performance indicators to help optimize your LLM deployment strategy.

Output Structure

Results are organized in timestamped directories:

/output/
├── model-name_YYYYMMDD_HHMMSS/
│   ├── benchmark-results.yaml
│   └── benchmark_info.txt
└── model-name_YYYYMMDD_HHMMSS.tar.gz

GuideLLM Workbench

The GuideLLM Workbench provides a user-friendly Streamlit web interface for running benchmarks interactively with real-time monitoring and result visualization.

GuideLLM Workbench Demo

GuideLLM Workbench Interface

Features

  • Interactive Configuration: Easy-to-use forms for endpoint, authentication, and benchmark parameters
  • Real-time Monitoring: Live metrics parsing during benchmark execution
  • Quick Stats: Sidebar with key performance indicators (requests/sec, tokens/sec, latency, TTFT)
  • Results History: Session-based storage with detailed result viewing
  • Download Results: Export benchmark results as YAML files
  • Comprehensive Results View: Detailed breakdown of all performance metrics

GuideLLM Workbench Results

Running the Workbench

Local Development

cd utils/guidellm-wb
pip install -r requirements.txt
streamlit run app.py

Container Deployment

# Use pre-built container from registry
podman run -p 8501:8501 quay.io/rh-aiservices-bu/guidellm-wb:v1

The workbench will be available at http://localhost:8501 and provides an intuitive interface for configuring and running GuideLLM benchmarks with immediate feedback and comprehensive result analysis.

About

GuideLLM Pipeline for Benchmark and Evaluate LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •