LLM-D Deployment Playbook for OpenShift

This playbook provides comprehensive guidance for deploying, operating, and troubleshooting LLM-D (Distributed LLM Inference) on Red Hat OpenShift AI.

What is LLM-D?

LLM-D enables intelligent routing and distributed inference for Large Language Models. It provides significant performance improvements over naive load balancing through:

Prefix-aware routing: Routes requests to replicas with cached prefixes, improving KV cache hit rates from ~25% to 90%+
Prefill/Decode disaggregation: Separates compute-intensive prefill from memory-bandwidth-bound decode phases
Load-aware scheduling: Balances traffic based on real-time metrics from vLLM instances

Playbook Contents

This playbook is self-contained with all deployment artifacts included.

Guides

Guide	Description
Pre-flight Validation	Verify cluster readiness and prerequisites
Quick Start	Connected environment deployment in minutes
Advanced Deployment	Bare metal, MetalLB, custom configurations
Automated Deployment	GitOps and automation patterns
Disconnected Installs	Air-gapped and restricted network deployments
Running Benchmarks	Performance testing with GuideLLM
Performance Debugging	Diagnosing and resolving performance issues

Included Artifacts

Directory	Contents
`gitops/operators/`	Operator installation manifests (MetalLB, Service Mesh, RHOAI, etc.)
`gitops/instance/`	Instance configurations (LLM-D, Gateway, monitoring, GuideLLM)
`gitops/ocp-4.19/`	OCP 4.19 prerequisites and configs
`gitops/ocp-4.18/`	OCP 4.18 prerequisites (experimental)
`gitops/disconnected/`	ImageSetConfigurations for air-gapped installs
`monitoring/`	Prometheus and Grafana stack for metrics
`vllm/`	Vanilla vLLM deployment for baseline comparison
`llm-d/`	LLM-D deployment configurations
`guidellm/`	GuideLLM benchmark configurations and overlays
`benchmark-job/`	Kubernetes job templates for benchmarking
`assets/`	Screenshots and images for documentation

Prerequisites Overview

Minimum Requirements

OpenShift: 4.19+
OpenShift AI: 2.25+ (3.0+ recommended)
GPU: NVIDIA GPU with appropriate drivers
Role: cluster-admin

Required Operators

Install in this order:

Cert Manager
MetalLB (bare metal only)
Service Mesh 3
Connectivity Link (RHOAI 3.0+)
Red Hat OpenShift AI
Node Feature Discovery
NVIDIA GPU Operator

Optional Operators

LeaderWorkerSet: Required only for large MoE models with expert parallelism

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                         Gateway API                              │
│                    (openshift-ai-inference)                      │
└─────────────────────────────┬───────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    EPP (Scheduler)                               │
│              - Prefix-aware scoring                              │
│              - Load-aware routing                                │
│              - KV cache utilization                              │
└─────────────────────────────┬───────────────────────────────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│   vLLM Replica 1  │ │   vLLM Replica 2  │ │   vLLM Replica N  │
│   (KV Cache)      │ │   (KV Cache)      │ │   (KV Cache)      │
└───────────────────┘ └───────────────────┘ └───────────────────┘

Support Resources

Contributing

This playbook consolidates learnings from real-world LLM-D implementations. Please contribute updates as the tooling evolves or new lessons are learned.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
benchmark-job		benchmark-job
gitops		gitops
guidellm		guidellm
llm-d		llm-d
monitoring		monitoring
scripts		scripts
vllm		vllm
01-preflight-validation.md		01-preflight-validation.md
02-quick-start.md		02-quick-start.md
03-advanced-deployment.md		03-advanced-deployment.md
04-automated-deployment.md		04-automated-deployment.md
05-disconnected-installs.md		05-disconnected-installs.md
06-running-benchmarks.md		06-running-benchmarks.md
07-performance-debugging.md		07-performance-debugging.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-D Deployment Playbook for OpenShift

What is LLM-D?

Playbook Contents

Guides

Included Artifacts

Prerequisites Overview

Minimum Requirements

Required Operators

Optional Operators

Architecture Overview

Support Resources

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM-D Deployment Playbook for OpenShift

What is LLM-D?

Playbook Contents

Guides

Included Artifacts

Prerequisites Overview

Minimum Requirements

Required Operators

Optional Operators

Architecture Overview

Support Resources

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages