This guide explains how to run multiple WVA controller instances in the same Kubernetes cluster with proper isolation.
By default, WVA operates as a single cluster-wide controller that manages all VariantAutoscaling (VA) resources and emits metrics to Prometheus. When running multiple WVA controller instances simultaneously (e.g., for parallel end-to-end tests, multi-tenant environments, or A/B testing), controllers may emit conflicting metrics that confuse HPA scaling decisions.
The controller instance isolation feature allows multiple WVA controllers to coexist in the same cluster by:
- Labeling metrics with a unique
controller_instanceidentifier - Filtering VA resources so each controller only manages explicitly assigned VAs
- Scoping HPA queries to select metrics from specific controller instances
Run multiple independent test suites simultaneously without metric conflicts:
# Test suite A with controller instance "test-a"
CONTROLLER_INSTANCE=test-a make test-e2e-full
# Test suite B with controller instance "test-b" (runs in parallel)
CONTROLLER_INSTANCE=test-b make test-e2e-fullEach test suite:
- Deploys its own WVA controller with a unique instance ID
- Creates VAs labeled with matching
controller_instance - HPA reads metrics filtered by
controller_instancelabel - No interference between test suites
Isolate autoscaling for different teams or environments:
# Team A controller in namespace wva-team-a
wva:
controllerInstance: "team-a"
# Team B controller in namespace wva-team-b
wva:
controllerInstance: "team-b"Each team's controller only manages VAs in their designated namespace with matching labels.
The most common multi-model pattern uses a single controller with multiple model
installations. Install the controller once, then add models using controller.enabled=false:
# Step 1: Install the WVA controller (once per cluster or namespace)
helm upgrade -i wva-controller ./charts/workload-variant-autoscaler \
--namespace wva-system \
--create-namespace \
--set controller.enabled=true \
--set va.enabled=false \
--set hpa.enabled=false \
--set vllmService.enabled=false# Step 2: Add Model A (only VA + HPA resources, no controller)
helm upgrade -i wva-model-a ./charts/workload-variant-autoscaler \
--namespace wva-system \
--set controller.enabled=false \
--set va.enabled=true \
--set hpa.enabled=true \
--set llmd.namespace=team-a \
--set llmd.modelName=my-model-a \
--set llmd.modelID="meta-llama/Llama-3.1-8B"# Step 3: Add Model B (same controller manages both models)
helm upgrade -i wva-model-b ./charts/workload-variant-autoscaler \
--namespace wva-system \
--set controller.enabled=false \
--set va.enabled=true \
--set hpa.enabled=true \
--set llmd.namespace=team-b \
--set llmd.modelName=my-model-b \
--set llmd.modelID="meta-llama/Llama-3.1-70B"With controller.enabled=false, the chart deploys only:
- VariantAutoscaling CR (if
va.enabled=true) - HorizontalPodAutoscaler (if
hpa.enabled=true) - Service and ServiceMonitor for vLLM metrics (if
vllmService.enabled=true) - RBAC ClusterRoles for VA resources (viewer, editor, admin)
It skips all controller infrastructure: Deployment, ServiceAccount, ConfigMaps, RBAC bindings, leader election roles, and prometheus CA certificates.
Tip: If using
controllerInstancefor metric isolation, set the same value on both the controller install and all model installs so the HPA metric selectors match.
Test new WVA versions alongside production:
# Production controller
wva:
controllerInstance: "production"
# Canary controller with new version
wva:
controllerInstance: "canary"
image:
tag: v0.5.0-rc1Enable controller instance isolation by setting wva.controllerInstance:
# values.yaml
wva:
controllerInstance: "my-instance-id"Install with Helm:
helm upgrade -i workload-variant-autoscaler ./charts/workload-variant-autoscaler \
--namespace workload-variant-autoscaler-system \
--set wva.controllerInstance=my-instance-idThe controller instance is configured via the CONTROLLER_INSTANCE environment variable:
# deployment.yaml
spec:
template:
spec:
containers:
- name: manager
env:
- name: CONTROLLER_INSTANCE
value: "my-instance-id"When controllerInstance is set, the Helm chart automatically adds the label to VA resources:
apiVersion: llmd.ai/v1alpha1
kind: VariantAutoscaling
metadata:
name: llama-8b-autoscaler
labels:
wva.llmd.ai/controller-instance: "my-instance-id"
spec:
modelId: "meta-llama/Llama-3.1-8B"Important: Each controller only reconciles VAs with a matching controller-instance label. VAs without this label are managed by controllers without CONTROLLER_INSTANCE set.
When CONTROLLER_INSTANCE is set, all emitted metrics include a controller_instance label:
# Without controller instance isolation
wva_desired_replicas{variant_name="llama-8b",namespace="llm-d",accelerator_type="H100"}
# With controller instance isolation
wva_desired_replicas{variant_name="llama-8b",namespace="llm-d",accelerator_type="H100",controller_instance="my-instance-id"}
Affected metrics:
wva_replica_scaling_totalwva_desired_replicaswva_current_replicaswva_desired_ratio
The controller uses a predicate filter to watch only VAs with matching labels:
// Controller watches VAs where:
// - Label wva.llmd.ai/controller-instance == CONTROLLER_INSTANCE (if set)
// - Label is absent (if CONTROLLER_INSTANCE is not set)This ensures complete isolation - each controller only reconciles its assigned VAs.
The HPA template automatically filters metrics by controller_instance when set:
# HPA with controller instance filtering
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
metrics:
- type: Pods
pods:
metric:
name: wva_desired_replicas
selector:
matchLabels:
variant_name: "llama-8b"
controller_instance: "my-instance-id"The feature is fully backwards compatible:
-
When
controllerInstanceis NOT set:- No
controller_instancelabel added to metrics - Controller manages all VAs (no label filtering)
- HPA queries metrics without
controller_instanceselector - Behavior identical to previous versions
- No
-
When
controllerInstanceIS set:controller_instancelabel added to all metrics- Controller only manages VAs with matching label
- HPA queries metrics filtered by
controller_instance
Upgrading existing deployments requires no changes unless you want to enable multi-controller isolation.
Use descriptive controller instance identifiers:
# ✅ Good - clear purpose
controllerInstance: "prod"
controllerInstance: "staging"
controllerInstance: "e2e-test-12345"
controllerInstance: "team-ml-inference"
# ❌ Avoid - unclear purpose
controllerInstance: "c1"
controllerInstance: "test"Do NOT manually add/remove controller-instance labels on VA resources managed by Helm. The Helm chart manages these labels automatically.
For manually created VAs, ensure labels match the target controller:
apiVersion: llmd.ai/v1alpha1
kind: VariantAutoscaling
metadata:
labels:
wva.llmd.ai/controller-instance: "my-instance-id" # Must match controllerQuery metrics for specific controller instances:
# Check desired replicas for specific controller instance
wva_desired_replicas{controller_instance="prod"}
# Compare scaling events across instances
sum by (controller_instance, direction) (
rate(wva_replica_scaling_total[5m])
)
# Alert on missing controller instance metrics
absent(wva_current_replicas{controller_instance="prod"})
When removing a controller instance, clean up associated resources:
# Delete controller deployment
helm uninstall workload-variant-autoscaler-instance-a
# Clean up orphaned VAs with instance label
kubectl delete va -l wva.llmd.ai/controller-instance=instance-a
# Clean up HPAs
kubectl delete hpa -l wva.llmd.ai/controller-instance=instance-aSymptom: VA status shows ObservedGeneration: 0 or conditions never update.
Cause: Label mismatch between VA and controller instance.
Solution:
-
Check controller instance configuration:
kubectl get deploy -n workload-variant-autoscaler-system \ workload-variant-autoscaler-controller-manager \ -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="CONTROLLER_INSTANCE")].value}' -
Check VA label:
kubectl get va llama-8b-autoscaler \ -o jsonpath='{.metadata.labels.wva\.llmd\.ai/controller-instance}' -
Ensure labels match or add missing label:
kubectl label va llama-8b-autoscaler \ wva.llmd.ai/controller-instance=my-instance-id
Symptom: HPA shows <unknown> for custom metric or doesn't scale deployment.
Cause: HPA metric selector doesn't match controller instance label.
Solution:
-
Check HPA metric selector:
kubectl get hpa llama-8b-hpa -o yaml | grep -A 10 selector -
Verify metrics exist with expected labels:
wva_desired_replicas{ variant_name="llama-8b", controller_instance="my-instance-id" } -
Update HPA selector to include
controller_instancelabel.
Symptom: Multiple controllers emit metrics for same variant, causing erratic HPA behavior.
Cause: Multiple controllers running without proper instance isolation.
Solution:
- Set unique
controllerInstancefor each controller - Ensure VA labels match respective controller instances
- Verify HPA selectors filter by
controller_instance
Deploy two test environments simultaneously:
# Environment A
helm upgrade -i wva-test-a ./charts/workload-variant-autoscaler \
--namespace wva-test-a \
--create-namespace \
--set wva.controllerInstance=test-a \
--set llmd.namespace=llm-test-a
# Environment B
helm upgrade -i wva-test-b ./charts/workload-variant-autoscaler \
--namespace wva-test-b \
--create-namespace \
--set wva.controllerInstance=test-b \
--set llmd.namespace=llm-test-bEach environment operates independently with isolated metrics and scaling decisions.
Test new WVA version for subset of workloads:
# Production controller (v0.4.1) manages production VAs
wva:
controllerInstance: "prod"
image:
tag: v0.4.1
---
# Canary controller (v0.5.0) manages canary VAs
wva:
controllerInstance: "canary"
image:
tag: v0.5.0-rc1Create canary VAs with controller-instance: canary label to test new version.
Isolate autoscaling for different teams:
# Deploy per-team controllers
for team in ml-research ml-production data-science; do
helm upgrade -i wva-${team} ./charts/workload-variant-autoscaler \
--namespace wva-${team} \
--create-namespace \
--set wva.controllerInstance=${team} \
--set llmd.namespace=${team}
doneEach team's workloads are managed by their dedicated controller instance.
- Installation Guide - Setting up WVA
- Configuration Guide - Configuring VariantAutoscaling resources
- HPA Integration - Integrating with Horizontal Pod Autoscaler
- Testing Guide - Running E2E tests with controller isolation