Skip to content

Commit 0229c80

Browse files
committed
update: bump version for llm-d and all images + add support for podman
- use env variable LLM_D_RELEASE to control all image in the deploy/install.sh - clone llm-d to local based on local version if match required release version - use env variable CONTAINER_TOOL to support podmano on fedora - remove/update *ignore files Signed-off-by: Wen Zhou <wenzhou@redhat.com>
1 parent 11c3130 commit 0229c80

7 files changed

Lines changed: 73 additions & 24 deletions

File tree

.dockerignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ vendor/
1414

1515
# Submodules and sibling repos (not needed for building the manager binary)
1616
sample-data/
17+
llm-d/
1718
llm-d-infra/
1819
# If building from a parent repo that includes llmd or GAIE, add:
1920
# llmd/

.gitignore

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@ gpu.cluster
3030
# llm-d and llm-d-infra directories
3131
llm-d/
3232
llm-d-infra/
33-
llmd/
34-
llmd-infra/
3533

3634
*.tgz
3735
actionlint

Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ CLUSTER_GPU_TYPE ?= nvidia-mix
77
CLUSTER_NODES ?= 3
88
CLUSTER_GPUS ?= 4
99
KUBECONFIG ?= $(HOME)/.kube/config
10-
K8S_VERSION ?= v1.32.0
10+
K8S_VERSION ?= v1.32.0 # match OCP 4.19
1111

1212
CONTROLLER_NAMESPACE ?= workload-variant-autoscaler-system
1313
MONITORING_NAMESPACE ?= openshift-user-workload-monitoring
@@ -194,6 +194,7 @@ deploy-e2e-infra: ## Deploy e2e test infrastructure (infra-only: WVA + llm-d, no
194194
WVA_IMAGE_REPO=$$IMAGE_REPO \
195195
WVA_IMAGE_TAG=$$IMAGE_TAG \
196196
WVA_IMAGE_PULL_POLICY=IfNotPresent \
197+
CONTAINER_TOOL=$(CONTAINER_TOOL) \
197198
./deploy/install.sh; \
198199
else \
199200
echo "IMG not set - using default image from registry (latest)"; \
@@ -204,6 +205,7 @@ deploy-e2e-infra: ## Deploy e2e test infrastructure (infra-only: WVA + llm-d, no
204205
SCALER_BACKEND=$(SCALER_BACKEND) \
205206
INSTALL_GATEWAY_CTRLPLANE=true \
206207
NAMESPACE_SCOPED=false \
208+
CONTAINER_TOOL=$(CONTAINER_TOOL) \
207209
./deploy/install.sh; \
208210
fi
209211

deploy/README.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ All deployment methods require:
3232
- **kubectl** (v1.24+) - Kubernetes CLI
3333
- **helm** (v3.8+) - Package manager for Kubernetes
3434
- **git** - Git CLI
35+
- **docker** or **podman** - Container tool for building and loading images
3536

3637
Optional but recommended:
3738

@@ -42,6 +43,8 @@ Platform-specific requirements:
4243
- **OpenShift**: `oc` CLI (v4.12+)
4344
- **Kind**: `kind` CLI for local testing
4445

46+
**Container Tool Support**: The deployment scripts support both Docker and Podman. Set `CONTAINER_TOOL=podman` to use Podman, or leave unset to use the default (`docker`).
47+
4548
### Cluster Requirements
4649

4750
**Minimum cluster specifications**:
@@ -269,6 +272,22 @@ kubectl get hpa --all-namespaces | grep -v kube-system # Should be empty (excep
269272
- ❌ Model services (tests create these)
270273
```
271274
275+
##### Example 8: Using specific llm-d release and Podman
276+
277+
Deploy with a specific llm-d release version and use Podman instead of Docker:
278+
279+
```bash
280+
export HF_TOKEN="hf_xxxxx"
281+
export LLM_D_RELEASE="v0.5.0" # Pin to specific llm-d version
282+
export CONTAINER_TOOL=podman # Use Podman instead of Docker
283+
make deploy-wva-emulated-on-kind
284+
285+
# The LLM_D_RELEASE variable automatically sets:
286+
# - LLM_D_INFERENCE_SCHEDULER_IMG=ghcr.io/llm-d/llm-d-inference-scheduler:v0.5.0
287+
# - LLM_D_INFERENCE_SIM_IMG=ghcr.io/llm-d/llm-d-inference-sim:v0.5.0
288+
# - llm-d repository clone version
289+
```
290+
272291
### Method 2: Helm Chart
273292

274293
The WVA can be deployed as a standalone using Helm, assuming you have:
@@ -621,6 +640,12 @@ Each guide includes platform-specific examples, troubleshooting, and quick start
621640
| `WVA_IMAGE_REPO` | WVA image repository | `ghcr.io/llm-d/llm-d-workload-variant-autoscaler` |
622641
| `WVA_IMAGE_TAG` | WVA image tag | `latest` |
623642
| `WVA_IMAGE_PULL_POLICY` | Image pull policy | `Always` |
643+
| `LLM_D_RELEASE` | llm-d release version (controls all llm-d images) | `v0.5.1` |
644+
| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |
645+
| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |
646+
| `CONTAINER_TOOL` | Container tool to use (docker or podman) | `docker` |
647+
648+
**Centralized llm-d Version Management**: Setting `LLM_D_RELEASE` automatically configures all llm-d component images to use the same release version. This ensures version consistency across the llm-d inference scheduler and simulator. Individual image variables can override this if needed.
624649

625650
#### Namespace Configuration
626651

@@ -682,7 +707,6 @@ HPA_STABILIZATION_SECONDS=30 ./deploy/install.sh
682707
| `WVA_LOG_LEVEL` | WVA logging level | `info` |
683708
| `VLLM_SVC_ENABLED` | Enable vLLM Service | `true` |
684709
| `VLLM_SVC_NODEPORT` | vLLM NodePort | `30000` |
685-
| `LLM_D_RELEASE` | llm-d version | `v0.3.0` |
686710
| `VLLM_MAX_NUM_SEQS` | vLLM max concurrent sequences per replica | (unset - uses vLLM default) |
687711

688712
**vLLM Performance Tuning:**

deploy/install.sh

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ CONTROLLER_INSTANCE=${CONTROLLER_INSTANCE:-""}
4747
# llm-d Configuration
4848
LLM_D_OWNER=${LLM_D_OWNER:-"llm-d"}
4949
LLM_D_PROJECT=${LLM_D_PROJECT:-"llm-d"}
50-
LLM_D_RELEASE=${LLM_D_RELEASE:-"v0.3.0"}
50+
LLM_D_RELEASE=${LLM_D_RELEASE:-"v0.5.1"}
5151
LLM_D_MODELSERVICE_NAME=${LLM_D_MODELSERVICE_NAME:-"ms-$WELL_LIT_PATH_NAME-llm-d-modelservice"}
5252
LLM_D_EPP_NAME=${LLM_D_EPP_NAME:-"gaie-$WELL_LIT_PATH_NAME-epp"}
5353
CLIENT_PREREQ_DIR=${CLIENT_PREREQ_DIR:-"$WVA_PROJECT/$LLM_D_PROJECT/guides/prereq/client-setup"}
@@ -57,9 +57,8 @@ LLM_D_MODELSERVICE_VALUES=${LLM_D_MODELSERVICE_VALUES:-"$EXAMPLE_DIR/ms-$WELL_LI
5757
ITL_AVERAGE_LATENCY_MS=${ITL_AVERAGE_LATENCY_MS:-20}
5858
TTFT_AVERAGE_LATENCY_MS=${TTFT_AVERAGE_LATENCY_MS:-200}
5959
ENABLE_SCALE_TO_ZERO=${ENABLE_SCALE_TO_ZERO:-true}
60-
# llm-d-inference scheduler with image with flowcontrol support
61-
# TODO: update once the llm-d-inference-scheduler v0.5.0 is released
62-
LLM_D_INFERENCE_SCHEDULER_IMG=${LLM_D_INFERENCE_SCHEDULER_IMG:-"ghcr.io/llm-d/llm-d-inference-scheduler:v0.5.0-rc.1"}
60+
LLM_D_INFERENCE_SCHEDULER_IMG=${LLM_D_INFERENCE_SCHEDULER_IMG:-"ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE"}
61+
LLM_D_INFERENCE_SIM_IMG=${LLM_D_INFERENCE_SIM_IMG:-"ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE"}
6362

6463
# Gateway Configuration
6564
GATEWAY_PROVIDER=${GATEWAY_PROVIDER:-"istio"} # Options: kgateway, istio
@@ -616,7 +615,7 @@ spec:
616615
serviceAccountName: gaie-sim-sa
617616
containers:
618617
- name: epp
619-
image: ghcr.io/llm-d/llm-d-inference-scheduler:v0.3.2
618+
image: $LLM_D_INFERENCE_SCHEDULER_IMG
620619
imagePullPolicy: Always
621620
args:
622621
- --poolName=$POOL_NAME_2
@@ -687,7 +686,7 @@ spec:
687686
spec:
688687
containers:
689688
- name: vllm
690-
image: ghcr.io/llm-d/llm-d-inference-sim:v0.5.1
689+
image: $LLM_D_INFERENCE_SIM_IMG
691690
imagePullPolicy: Always
692691
args:
693692
- --model=$MODEL_ID_2
@@ -787,12 +786,23 @@ EOF
787786
deploy_llm_d_infrastructure() {
788787
log_info "Deploying llm-d infrastructure..."
789788

790-
# Clone llm-d repo if not exists
789+
# Clone llm-d repo if not exists or if has older version locally
790+
if [ -d "$LLM_D_PROJECT/.git" ]; then
791+
CURRENT_TAG=$(cd "$LLM_D_PROJECT" && git describe --tags --exact-match 2>/dev/null || echo "unknown")
792+
if [ "$CURRENT_TAG" != "$LLM_D_RELEASE" ]; then
793+
log_warning "$LLM_D_PROJECT exists but has version '$CURRENT_TAG' (expected: $LLM_D_RELEASE)"
794+
rm -rf "$LLM_D_PROJECT"
795+
else
796+
log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
797+
fi
798+
elif [ -d "$LLM_D_PROJECT" ]; then
799+
log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"
800+
rm -rf "$LLM_D_PROJECT"
801+
fi
802+
791803
if [ ! -d "$LLM_D_PROJECT" ]; then
792804
log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
793805
git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
794-
else
795-
log_warning "$LLM_D_PROJECT directory already exists, skipping clone"
796806
fi
797807

798808
# Check for HF_TOKEN (use dummy for emulated deployments)
@@ -839,7 +849,7 @@ deploy_llm_d_infrastructure() {
839849
# Install Gateway control plane if enabled
840850
if [[ "$INSTALL_GATEWAY_CTRLPLANE" == "true" ]]; then
841851
log_info "Installing Gateway control plane ($GATEWAY_PROVIDER)"
842-
helmfile apply -f "$GATEWAY_PREREQ_DIR/$GATEWAY_PROVIDER.helmfile.yaml"
852+
helmfile apply -f "$GATEWAY_PREREQ_DIR/$GATEWAY_PROVIDER.helmfile.yaml" --suppress-diff
843853
else
844854
log_info "Skipping Gateway control plane installation (INSTALL_GATEWAY_CTRLPLANE=false)"
845855
fi
@@ -930,7 +940,7 @@ deploy_llm_d_infrastructure() {
930940
helmfile_selector="--selector kind!=autoscaling"
931941
log_info "Skipping WVA in helmfile (will be deployed separately from local chart)"
932942
fi
933-
helmfile apply -e $GATEWAY_PROVIDER -n ${LLMD_NS} $helmfile_selector
943+
helmfile apply -e $GATEWAY_PROVIDER -n ${LLMD_NS} $helmfile_selector --suppress-diff
934944

935945
# Post-deploy: align the WVA vllm-service selector and ServiceMonitor to match
936946
# the actual pod labels. The llm-d-modelservice chart sets pod labels from

deploy/kind-emulator/install.sh

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@ WVA_LOG_LEVEL="debug" # WVA log level set to debug for emulated environments
3939
# Initial WVA pool group; install.sh auto-detects the actual InferencePool API group after llm-d deploy and upgrades WVA (scale-from-zero).
4040
POOL_GROUP=${POOL_GROUP:-"inference.networking.k8s.io"}
4141

42+
# Container tool (docker or podman can pass from Makefile)
43+
CONTAINER_TOOL=${CONTAINER_TOOL:-docker}
44+
4245
# llm-d Configuration
4346
LLM_D_INFERENCE_SIM_IMG_REPO=${LLM_D_INFERENCE_SIM_IMG_REPO:-"ghcr.io/llm-d/llm-d-inference-sim"}
4447
LLM_D_INFERENCE_SIM_IMG_TAG=${LLM_D_INFERENCE_SIM_IMG_TAG:-"latest"}
@@ -173,14 +176,14 @@ load_image() {
173176
log_info "Using local image only (WVA_IMAGE_PULL_POLICY=IfNotPresent)"
174177

175178
# Check if the image exists locally
176-
if ! docker image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
177-
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
179+
if ! $CONTAINER_TOOL image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
180+
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
178181
else
179182
log_success "Found local image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG'"
180183
fi
181184
else
182185
# Pull a single-platform image so kind load does not hit "content digest not found"
183-
# (multi-platform manifests can reference blobs that are not in the docker save stream).
186+
# (multi-platform manifests can reference blobs that are not in the $CONTAINER_TOOL save stream).
184187
local platform="${KIND_IMAGE_PLATFORM:-}"
185188
if [ -z "$platform" ]; then
186189
case "$(uname -m)" in
@@ -202,9 +205,18 @@ load_image() {
202205
fi
203206

204207
# Load the image into the KIND cluster
205-
kind load docker-image "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" --name "$CLUSTER_NAME"
206-
207-
log_success "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' loaded into KIND cluster '$CLUSTER_NAME'"
208+
if [ "$CONTAINER_TOOL" = "podman" ]; then
209+
# Podman requires a different approach - save to tar and load archive
210+
log_info "Using Podman - saving image to tar archive for Kind loading..."
211+
local tmp_tar="/tmp/wva-image-$(date +%s).tar"
212+
$CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"
213+
kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"
214+
rm -f "$tmp_tar"
215+
log_success "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' loaded into KIND cluster '$CLUSTER_NAME' (via archive)"
216+
else
217+
kind load docker-image "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" --name "$CLUSTER_NAME"
218+
log_success "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' loaded into KIND cluster '$CLUSTER_NAME'"
219+
fi
208220
}
209221

210222
#### REQUIRED FUNCTION used by deploy/install.sh ####

deploy/kubernetes/create-kind-cluster-with-nvidia.sh

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
#!/usr/bin/env bash
2-
32
set -e
43
set -o pipefail
54

5+
# Container tool (docker or podman)
6+
CONTAINER_TOOL=${CONTAINER_TOOL:-docker}
7+
68
GPU_OPERATOR_NS=gpu-operator
79

810
echo "> Creating Kind cluster"
@@ -22,10 +24,10 @@ echo "> Deploying cert manager"
2224
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.yaml
2325

2426
echo "> Creating symlink in the control-plane container"
25-
docker exec -ti kind-control-plane ln -s /sbin/ldconfig /sbin/ldconfig.real
27+
$CONTAINER_TOOL exec -ti kind-control-plane ln -s /sbin/ldconfig /sbin/ldconfig.real
2628

2729
echo "> Unmounting the nvidia devices in the control-plane container"
28-
docker exec -ti kind-control-plane umount -R /proc/driver/nvidia
30+
$CONTAINER_TOOL exec -ti kind-control-plane umount -R /proc/driver/nvidia
2931

3032
# According to https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html
3133
echo "> Adding/updateding the NVIDIA Helm repository"

0 commit comments

Comments
 (0)