Skip to content

Commit ef59ed1

Browse files
committed
Adding script to deploy llm-d+wva
Signed-off-by: dumb0002 <Braulio.Dumba@ibm.com>
1 parent 3e14186 commit ef59ed1

2 files changed

Lines changed: 1008 additions & 0 deletions

File tree

Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
# WVA Stack Deployment Script
2+
3+
This directory contains a deployment script for setting up the Workload-Variant-Autoscaler (WVA) stack for E2E testing.
4+
5+
## Table of Contents
6+
7+
- [Overview](#overview)
8+
- [Prerequisites](#prerequisites)
9+
- [Quick Start](#quick-start)
10+
- [Configuration](#configuration)
11+
- [Usage Examples](#usage-examples)
12+
- [Verification](#verification)
13+
- [Viewing Logs](#viewing-logs)
14+
- [Cleanup](#cleanup)
15+
- [Kind Cluster Configuration](#kind-cluster-configuration)
16+
17+
## Overview
18+
19+
The `deploy-wva-stack.sh` script automates the deployment of:
20+
- WVA Controller
21+
- llm-d Infrastructure
22+
- Prometheus Monitoring
23+
- Prometheus Adapter
24+
- Optional: VariantAutoscaling (VA)
25+
- Optional: HorizontalPodAutoscaler (HPA)
26+
27+
The script clones the official WVA repository from GitHub and uses the deployment scripts provided there.
28+
29+
## Prerequisites
30+
31+
- `kubectl` configured with cluster access
32+
- `helm` installed
33+
- `git` installed
34+
- `HF_TOKEN` environment variable set (for llm-d deployment)
35+
36+
## Quick Start
37+
38+
### Basic Deployment (WVA + llm-d + Prometheus)
39+
40+
**On existing Kubernetes cluster:**
41+
```bash
42+
export HF_TOKEN="hf_xxxxx"
43+
./deploy-wva-stack.sh
44+
```
45+
46+
**On Kind cluster (emulated GPUs):**
47+
```bash
48+
export KIND_CLUSTER_NAME="kind-fma-cluster"
49+
./deploy-wva-stack.sh --kind
50+
```
51+
52+
This deploys:
53+
- ✅ WVA Controller
54+
- ✅ llm-d Infrastructure (with simulator for Kind)
55+
- ✅ Prometheus
56+
- ✅ Prometheus Adapter
57+
- ❌ VariantAutoscaling (opt-in)
58+
- ❌ HPA (opt-in)
59+
60+
### Deployment with HPA and VA
61+
62+
**On existing cluster:**
63+
```bash
64+
export HF_TOKEN="hf_xxxxx"
65+
./deploy-wva-stack.sh --with-hpa --with-va
66+
```
67+
68+
**On Kind cluster:**
69+
```bash
70+
export KIND_CLUSTER_NAME="kind-fma-cluster"
71+
./deploy-wva-stack.sh --kind --with-hpa --with-va
72+
```
73+
74+
### Create Kind Cluster and Deploy
75+
76+
```bash
77+
# Create Kind cluster and deploy full stack
78+
./deploy-wva-stack.sh --create-kind --with-hpa --with-va
79+
```
80+
81+
### Create Kind Cluster Only (No Deployment)
82+
83+
```bash
84+
# Create Kind cluster with custom configuration (no deployment)
85+
export KIND_CLUSTER_NAME="kind-fma-cluster"
86+
export KIND_CLUSTER_NODES=4
87+
export KIND_CLUSTER_GPUS=2
88+
export KIND_CLUSTER_GPU_TYPE="nvidia"
89+
./deploy-wva-stack.sh --create-kind-only
90+
91+
# Later, deploy to the existing cluster
92+
# IMPORTANT: Must export KIND_CLUSTER_NAME to match your cluster
93+
export KIND_CLUSTER_NAME="kind-fma-cluster"
94+
./deploy-wva-stack.sh --kind --with-hpa
95+
```
96+
97+
**Note:** When deploying to an existing Kind cluster with `--kind`, you must export `KIND_CLUSTER_NAME` to match your cluster name (default: `kind-wva-gpu-cluster`).
98+
99+
### WVA Only (No llm-d)
100+
101+
```bash
102+
./deploy-wva-stack.sh --wva-only
103+
```
104+
105+
### llm-d Only (No WVA)
106+
107+
```bash
108+
export HF_TOKEN="hf_xxxxx"
109+
./deploy-wva-stack.sh --llmd-only
110+
```
111+
112+
## Configuration
113+
114+
### Environment Variables
115+
116+
You can configure the deployment using environment variables:
117+
118+
```bash
119+
# Deployment flags
120+
export DEPLOY_WVA=true # Deploy WVA controller (default: true)
121+
export DEPLOY_LLM_D=true # Deploy llm-d infrastructure (default: true)
122+
export DEPLOY_VA=false # Deploy VariantAutoscaling (default: false)
123+
export DEPLOY_HPA=false # Deploy HPA (default: false)
124+
export DEPLOY_PROMETHEUS=true # Deploy Prometheus (default: true)
125+
export DEPLOY_PROMETHEUS_ADAPTER=true # Deploy Prometheus Adapter (default: true)
126+
127+
# WVA repository configuration
128+
export WVA_REPO_BRANCH=main # WVA repository branch (default: main)
129+
130+
# HuggingFace token (required for llm-d)
131+
export HF_TOKEN="hf_xxxxx"
132+
133+
# Run deployment
134+
./deploy-wva-stack.sh
135+
```
136+
137+
### Command-Line Options
138+
139+
```bash
140+
./deploy-wva-stack.sh [options]
141+
142+
Options:
143+
-h, --help Show help message
144+
-c, --cleanup Clean up existing deployment before installing
145+
-d, --delete-repo Delete WVA repository after deployment
146+
--wva-only Deploy only WVA (skip llm-d)
147+
--llmd-only Deploy only llm-d (skip WVA)
148+
--with-hpa Deploy HPA (default: false)
149+
--with-va Deploy VariantAutoscaling (default: false)
150+
--kind Deploy to Kind cluster (uses kind-emulator scripts)
151+
--create-kind Create Kind cluster before deployment
152+
--create-kind-only Create Kind cluster only (skip deployment)
153+
--delete-kind Delete Kind cluster after deployment
154+
```
155+
156+
## Usage Examples
157+
158+
### Example 1: Full Stack with HPA for Testing
159+
160+
```bash
161+
export HF_TOKEN="hf_xxxxx"
162+
export DEPLOY_VA=true
163+
export DEPLOY_HPA=true
164+
./deploy-wva-stack.sh
165+
```
166+
167+
168+
### Example 2: Infrastructure Only
169+
170+
```bash
171+
export HF_TOKEN="hf_xxxxx"
172+
export DEPLOY_VA=false
173+
export DEPLOY_HPA=false
174+
./deploy-wva-stack.sh
175+
```
176+
177+
178+
### Example 3: Kind Cluster with Custom Configuration
179+
180+
```bash
181+
# Create Kind cluster with custom settings
182+
export KIND_CLUSTER_NAME="kind-fma-cluster"
183+
export KIND_CLUSTER_NODES=4
184+
export KIND_CLUSTER_GPUS=2
185+
export KIND_CLUSTER_GPU_TYPE="nvidia"
186+
./deploy-wva-stack.sh --create-kind --with-hpa
187+
```
188+
189+
### Example 4: Custom WVA Branch
190+
191+
```bash
192+
export WVA_REPO_BRANCH=feature/new-feature
193+
export HF_TOKEN="hf_xxxxx"
194+
./deploy-wva-stack.sh
195+
```
196+
197+
## Verification
198+
199+
After deployment, verify the components:
200+
201+
```bash
202+
# Check WVA controller
203+
kubectl get pods -n workload-variant-autoscaler-system
204+
205+
# Check llm-d resources
206+
kubectl get all -n $(kubectl get ns -o name | grep llm-d | head -n1 | cut -d'/' -f2)
207+
208+
# Check VariantAutoscaling resources
209+
kubectl get variantautoscalings.llmd.ai -A
210+
211+
# Check HPA resources
212+
kubectl get hpa -A
213+
214+
# Check Prometheus
215+
kubectl get pods -n workload-variant-autoscaler-monitoring
216+
```
217+
218+
## Viewing Logs
219+
220+
```bash
221+
# WVA controller logs
222+
kubectl logs -n workload-variant-autoscaler-system \
223+
-l control-plane=controller-manager -f
224+
225+
# llm-d logs (example for decode deployment)
226+
kubectl logs -n llm-d-inference-scheduler \
227+
deployment/ms-inference-scheduling-llm-d-modelservice-decode -f
228+
```
229+
230+
## Cleanup
231+
232+
### Remove Deployment Only
233+
234+
```bash
235+
./deploy-wva-stack.sh --cleanup
236+
```
237+
238+
OR if HPA and VA are enabled:
239+
240+
```bash
241+
./deploy-wva-stack.sh --cleanup --with-hpa --with-va
242+
```
243+
244+
### Remove Deployment and Kind Cluster
245+
246+
```bash
247+
./deploy-wva-stack.sh --cleanup --delete-kind
248+
```
249+
250+
## Kind Cluster Configuration
251+
252+
When using `--create-kind` or `--create-kind-only`, you can customize the cluster configuration:
253+
254+
### Environment Variables
255+
256+
```bash
257+
export KIND_CLUSTER_NAME="kind-fma-cluster" # Default: kind-wva-gpu-cluster
258+
export KIND_CLUSTER_NODES=4 # Default: 3
259+
export KIND_CLUSTER_GPUS=2 # Default: 4 (GPUs per node)
260+
export KIND_CLUSTER_GPU_TYPE="nvidia" # Default: mix (nvidia|amd|intel|mix)
261+
```
262+
263+
### Usage Examples
264+
265+
**Create cluster and deploy immediately:**
266+
```bash
267+
export KIND_CLUSTER_NAME="kind-fma-cluster"
268+
export KIND_CLUSTER_NODES=4
269+
export KIND_CLUSTER_GPUS=2
270+
export KIND_CLUSTER_GPU_TYPE="nvidia"
271+
./deploy-wva-stack.sh --create-kind --with-hpa
272+
```
273+
274+
**Create cluster only (for later deployment):**
275+
```bash
276+
export KIND_CLUSTER_NAME="kind-fma-cluster"
277+
export KIND_CLUSTER_NODES=4
278+
export KIND_CLUSTER_GPUS=2
279+
export KIND_CLUSTER_GPU_TYPE="nvidia"
280+
./deploy-wva-stack.sh --create-kind-only
281+
```
282+
283+
### Cluster Features
284+
285+
The Kind cluster includes:
286+
- Emulated GPU resources (nvidia.com/gpu, amd.com/gpu, intel.com/gpu)
287+
- HPAScaleToZero feature gate enabled (for scale-to-zero testing)
288+
- llm-d inference simulator (no real model loading)

0 commit comments

Comments
 (0)