WVA Stack Deployment Script by dumb0002 · Pull Request #416 · llm-d-incubation/llm-d-fast-model-actuation

dumb0002 · 2026-04-09T13:50:28Z

This PR provides a script to deploy the WVA Stack.

What the Script Does:
The deploy-wva-stack.sh script provides a comprehensive deployment automation tool that:

Clones WVA Repository: Fetches the official WVA repository from GitHub (configurable branch)
Manages Kind Clusters: Creates/deletes Kind clusters with emulated GPU resources for testing
Deploys Full WVA Stack: Orchestrates deployment of:
- WVA Controller (workload variant autoscaling)
- llm-d Infrastructure (LLM deployment infrastructure)
- Prometheus & Prometheus Adapter (monitoring and metrics)
- Optional: HorizontalPodAutoscaler (HPA)
- Optional: VariantAutoscaling (VA)

Signed-off-by: dumb0002 <Braulio.Dumba@ibm.com>

diegocastanibm

Error while creating the cluster:

Creating cluster "kind-wva-gpu-cluster" ...
 ✓ Ensuring node image (kindest/node:v1.32.0) 🖼
 ✓ Preparing nodes 📦 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✗ Joining worker nodes 🚜
Deleted nodes: ["kind-wva-gpu-cluster-control-plane" "kind-wva-gpu-cluster-worker" "kind-wva-gpu-cluster-worker2"]
ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged kind-wva-gpu-cluster-worker kubeadm join --config /kind/kubeadm.conf --v=6" failed with error: exit status 1
Command Output: I0410 19:07:05.433916     137 join.go:421] [preflight] found NodeName empty; using OS hostname as NodeName
I0410 19:07:05.434093     137 joinconfiguration.go:83] loading configuration from "/kind/kubeadm.conf"
W0410 19:07:05.434523     137 common.go:101] your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta3" (kind: "JoinConfiguration"). Please use 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.

diegocastanibm · 2026-04-10T19:40:08Z

If I reduce the number of nodes to 2, the kind cluster is created, but then I have another error:

$ KIND_CLUSTER_NODES=2 ./deploy-wva-stack.sh --create-kind --llmd-only --with-hpa
Install Script Output:
  ==========================================
  [INFO] Starting Workload-Variant-Autoscaler Deployment on kind-emulator
  [INFO] ===========================================================

  [INFO] Checking prerequisites...
  [SUCCESS] All generic prerequisites tools met
  [INFO] Setting TLS verification...
  [INFO] Emulated environment detected - enabling TLS skip verification for self-signed certificates
  [SUCCESS] Successfully set TLS verification to: true
  [INFO] Setting WVA logging level...
  [INFO] Development environment - using debug logging
  [SUCCESS] WVA logging level set to: debug

  [INFO] Loading environment-specific functions for kind-emulator...
  [INFO] Checking Kubernetes-specific prerequisites...
  [INFO] Cluster creation skipped (CREATE_CLUSTER=false)
  [SUCCESS] Using KIND cluster 'kind-wva-gpu-cluster'
  [INFO] Loading WVA image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' into KIND cluster...
  [INFO] Pulling single-platform image for KIND (platform=linux/arm64) to avoid load errors...
  Error response from daemon: Head "https://ghcr.io/v2/llm-d/llm-d-workload-variant-autoscaler/manifests/latest": denied:
  denied
  [WARNING] Failed to pull image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' (platform=linux/arm64)
  [INFO] Attempting to use existing local image...
  [ERROR] Image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' not found locally - Please build or pull the image

  ==========================================
  [ERROR] Install script failed with exit code: 0

Why do I need ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest if I'm using the --llmd-only option?

dumb0002 · 2026-04-13T14:19:51Z

If I reduce the number of nodes to 2, the kind cluster is created, but then I have another error:

$ KIND_CLUSTER_NODES=2 ./deploy-wva-stack.sh --create-kind --llmd-only --with-hpa
Install Script Output:
  ==========================================
  [INFO] Starting Workload-Variant-Autoscaler Deployment on kind-emulator
  [INFO] ===========================================================

  [INFO] Checking prerequisites...
  [SUCCESS] All generic prerequisites tools met
  [INFO] Setting TLS verification...
  [INFO] Emulated environment detected - enabling TLS skip verification for self-signed certificates
  [SUCCESS] Successfully set TLS verification to: true
  [INFO] Setting WVA logging level...
  [INFO] Development environment - using debug logging
  [SUCCESS] WVA logging level set to: debug

  [INFO] Loading environment-specific functions for kind-emulator...
  [INFO] Checking Kubernetes-specific prerequisites...
  [INFO] Cluster creation skipped (CREATE_CLUSTER=false)
  [SUCCESS] Using KIND cluster 'kind-wva-gpu-cluster'
  [INFO] Loading WVA image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' into KIND cluster...
  [INFO] Pulling single-platform image for KIND (platform=linux/arm64) to avoid load errors...
  Error response from daemon: Head "https://ghcr.io/v2/llm-d/llm-d-workload-variant-autoscaler/manifests/latest": denied:
  denied
  [WARNING] Failed to pull image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' (platform=linux/arm64)
  [INFO] Attempting to use existing local image...
  [ERROR] Image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' not found locally - Please build or pull the image

  ==========================================
  [ERROR] Install script failed with exit code: 0

Why do I need ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest if I'm using the --llmd-only option?

@diegocastanibm, this is the current behavior of the installation script from the wva repo - it always loads all images as part of the prerequisites steps.

dumb0002 · 2026-04-13T14:24:19Z

Error while creating the cluster:

Creating cluster "kind-wva-gpu-cluster" ...
 ✓ Ensuring node image (kindest/node:v1.32.0) 🖼
 ✓ Preparing nodes 📦 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✗ Joining worker nodes 🚜
Deleted nodes: ["kind-wva-gpu-cluster-control-plane" "kind-wva-gpu-cluster-worker" "kind-wva-gpu-cluster-worker2"]
ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged kind-wva-gpu-cluster-worker kubeadm join --config /kind/kubeadm.conf --v=6" failed with error: exit status 1
Command Output: I0410 19:07:05.433916     137 join.go:421] [preflight] found NodeName empty; using OS hostname as NodeName
I0410 19:07:05.434093     137 joinconfiguration.go:83] loading configuration from "/kind/kubeadm.conf"
W0410 19:07:05.434523     137 common.go:101] your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta3" (kind: "JoinConfiguration"). Please use 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.

@diegocastanibm, this could be due to resource constraints in your environment.

diegocastanibm · 2026-04-13T14:28:17Z

If I reduce the number of nodes to 2, the kind cluster is created, but then I have another error:

$ KIND_CLUSTER_NODES=2 ./deploy-wva-stack.sh --create-kind --llmd-only --with-hpa
Install Script Output:
  ==========================================
  [INFO] Starting Workload-Variant-Autoscaler Deployment on kind-emulator
  [INFO] ===========================================================

  [INFO] Checking prerequisites...
  [SUCCESS] All generic prerequisites tools met
  [INFO] Setting TLS verification...
  [INFO] Emulated environment detected - enabling TLS skip verification for self-signed certificates
  [SUCCESS] Successfully set TLS verification to: true
  [INFO] Setting WVA logging level...
  [INFO] Development environment - using debug logging
  [SUCCESS] WVA logging level set to: debug

  [INFO] Loading environment-specific functions for kind-emulator...
  [INFO] Checking Kubernetes-specific prerequisites...
  [INFO] Cluster creation skipped (CREATE_CLUSTER=false)
  [SUCCESS] Using KIND cluster 'kind-wva-gpu-cluster'
  [INFO] Loading WVA image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' into KIND cluster...
  [INFO] Pulling single-platform image for KIND (platform=linux/arm64) to avoid load errors...
  Error response from daemon: Head "https://ghcr.io/v2/llm-d/llm-d-workload-variant-autoscaler/manifests/latest": denied:
  denied
  [WARNING] Failed to pull image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' (platform=linux/arm64)
  [INFO] Attempting to use existing local image...
  [ERROR] Image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' not found locally - Please build or pull the image

  ==========================================
  [ERROR] Install script failed with exit code: 0

Why do I need ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest if I'm using the --llmd-only option?

@diegocastanibm, this is the current behavior of the installation script from the wva repo - it always loads all images as part of the prerequisites steps.

Maybe the behaviour needs to change. I do not see the need of loading WVA in this case. What's the point?

dumb0002 · 2026-04-13T14:59:15Z

If I reduce the number of nodes to 2, the kind cluster is created, but then I have another error:

$ KIND_CLUSTER_NODES=2 ./deploy-wva-stack.sh --create-kind --llmd-only --with-hpa
Install Script Output:
  ==========================================
  [INFO] Starting Workload-Variant-Autoscaler Deployment on kind-emulator
  [INFO] ===========================================================

  [INFO] Checking prerequisites...
  [SUCCESS] All generic prerequisites tools met
  [INFO] Setting TLS verification...
  [INFO] Emulated environment detected - enabling TLS skip verification for self-signed certificates
  [SUCCESS] Successfully set TLS verification to: true
  [INFO] Setting WVA logging level...
  [INFO] Development environment - using debug logging
  [SUCCESS] WVA logging level set to: debug

  [INFO] Loading environment-specific functions for kind-emulator...
  [INFO] Checking Kubernetes-specific prerequisites...
  [INFO] Cluster creation skipped (CREATE_CLUSTER=false)
  [SUCCESS] Using KIND cluster 'kind-wva-gpu-cluster'
  [INFO] Loading WVA image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' into KIND cluster...
  [INFO] Pulling single-platform image for KIND (platform=linux/arm64) to avoid load errors...
  Error response from daemon: Head "https://ghcr.io/v2/llm-d/llm-d-workload-variant-autoscaler/manifests/latest": denied:
  denied
  [WARNING] Failed to pull image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' (platform=linux/arm64)
  [INFO] Attempting to use existing local image...
  [ERROR] Image 'ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest' not found locally - Please build or pull the image

  ==========================================
  [ERROR] Install script failed with exit code: 0

Why do I need ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest if I'm using the --llmd-only option?

@diegocastanibm, this is the current behavior of the installation script from the wva repo - it always loads all images as part of the prerequisites steps.

Maybe the behaviour needs to change. I do not see the need of loading WVA in this case. What's the point?

@diegocastanibm, any change would require opening a PR in the WVA upstream repo since we're just reusing their scripts. IMO from their use-case it makes sense to always load the WVA image as that's the main goal of their installation script.

diegocastanibm

Critical points:

1- As I pointed out during the deployment, when using --llmd-only --kind, the deployment fails because the upstream WVA install script (deploy/kind-emulator/install.sh, line 131) calls load_image() unconditionally inside check_prerequisites_kind_emulated(), regardless of the DEPLOY_WVA flag. This means it always tries to pull ghcr.io/llm-d/llm-d-workload-variant-autoscaler:latest, which requires ghcr.io authentication and isn't needed when WVA is not being deployed.

This is an upstream bug in the WVA repo, but this script should handle it — either by documenting the limitation, adding a workaround (e.g., exporting WVA_IMAGE_PULL_POLICY=IfNotPresent with a dummy image), or opening an upstream issue.

2- The upstream WVA install script (deploy/install.sh, line 50) defaults to LLM_D_RELEASE=v0.3.0, but the latest llm-d release is v0.6.0. This deploys a significantly outdated version of llm-d. There is also an inconsistency within the WVA defaults themselves: the inference-scheduler image is pinned to v0.7.0 while the llm-d repo is cloned at v0.3.0.

This script should either export LLM_D_RELEASE to a more recent version, or at minimum document this version gap and how users can override it (e.g., export LLM_D_RELEASE=v0.6.0).

3- Similarly to the llm-d version issue above, the modelservice image version used by the upstream WVA deploy scripts is v0.2.11, while the current release of llm-d-modelservice is v0.4.11. This is a significant version gap that may cause compatibility issues or missing features during testing.

4- The llm-d-sim (inference simulator used in Kind emulator environments) is currently at v0.8.2, but the upstream WVA deploy scripts use an older version. Since the Kind emulator environment relies on llm-d-sim instead of real model serving, running an outdated simulator version could mask bugs or miss behavioral changes that are present in the current release.

diegocastanibm · 2026-04-13T15:27:28Z

+    # Run the cleanup function from WVA repository
+    log_info "Running WVA cleanup function..."
+    echo ""
+    echo "=========================================="
+    echo "WVA Cleanup Output:"
+    echo "=========================================="
+
+    # Disable exit on error temporarily to capture cleanup result
+    set +e
+    cleanup
+    local cleanup_exit_code=$?
+    set -e


The cleanup is ALWAYS executed. It should be executed ONLY if nothing fails. Otherwise is difficult to debug the WVA scripts si something fails

diegocastanibm · 2026-04-13T15:30:46Z

+    # Check if this is cleanup-only mode (--cleanup flag without other deployment changes)
+    # We detect cleanup-only mode by checking if CLEANUP_BEFORE_DEPLOY is true and
+    # no cluster creation is requested. The deployment flags (DEPLOY_WVA, DEPLOY_LLM_D)
+    # should be at their default values (both true) when using --cleanup alone.
+    local is_cleanup_only=false
+    if [ "$CLEANUP_BEFORE_DEPLOY" = true ] && [ "$CREATE_KIND_CLUSTER" != "true" ]; then
+        # Check if deployment flags are at defaults (not modified by --wva-only or --llmd-only)
+        if [ "$DEPLOY_WVA" = "true" ] && [ "$DEPLOY_LLM_D" = "true" ]; then
+            is_cleanup_only=true
+        fi
+    fi


The logic to detect cleanup-only depends of using default values for DEPLOY_WVA and DEPLOY_LLM_D. This means that --cleanup --llmd-only is not doing a clean up. We should add an explicit --cleanup-only flag

diegocastanibm · 2026-04-13T15:38:31Z

+        echo ""
+        echo "=========================================="
+        log_error "Install script failed with exit code: $?"
+        exit 1


Your $? is capturing the exit code from echo. Better to safe if in a variable before the echos:

Suggested change

echo ""

echo "=========================================="

log_error "Install script failed with exit code: $?"

exit 1

local exit_code=$?

echo ""

echo "=========================================="

log_error "Install script failed with exit code: $exit_code"

exit 1

Adding script to deploy llm-d+wva

6ac6736

Signed-off-by: dumb0002 <Braulio.Dumba@ibm.com>

dumb0002 requested review from Copilot and diegocastanibm April 9, 2026 13:50

Copilot started reviewing on behalf of dumb0002 April 9, 2026 13:51 View session

fixing a typo

b60c5ca

Signed-off-by: dumb0002 <Braulio.Dumba@ibm.com>

diegocastanibm reviewed Apr 10, 2026

View reviewed changes

diegocastanibm reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WVA Stack Deployment Script#416

WVA Stack Deployment Script#416
dumb0002 wants to merge 2 commits intollm-d-incubation:mainfrom
dumb0002:wva-fma

dumb0002 commented Apr 9, 2026

Uh oh!

diegocastanibm left a comment

Uh oh!

diegocastanibm commented Apr 10, 2026

Uh oh!

dumb0002 commented Apr 13, 2026

Uh oh!

dumb0002 commented Apr 13, 2026

Uh oh!

diegocastanibm commented Apr 13, 2026

Uh oh!

dumb0002 commented Apr 13, 2026 •

edited

Loading

Uh oh!

diegocastanibm left a comment

Uh oh!

diegocastanibm Apr 13, 2026

Uh oh!

diegocastanibm Apr 13, 2026

Uh oh!

diegocastanibm Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dumb0002 commented Apr 9, 2026

Uh oh!

diegocastanibm left a comment

Choose a reason for hiding this comment

Uh oh!

diegocastanibm commented Apr 10, 2026

Uh oh!

dumb0002 commented Apr 13, 2026

Uh oh!

dumb0002 commented Apr 13, 2026

Uh oh!

diegocastanibm commented Apr 13, 2026

Uh oh!

dumb0002 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

diegocastanibm left a comment

Choose a reason for hiding this comment

Uh oh!

diegocastanibm Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

diegocastanibm Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

diegocastanibm Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dumb0002 commented Apr 13, 2026 •

edited

Loading