Comprehensive guide for testing the Workload-Variant-Autoscaler (WVA).
WVA has a multi-layered testing strategy:
- Unit Tests - Fast, isolated tests for individual packages and functions
- Integration Tests - Tests for component interactions within the controller
- E2E Tests - Environment-agnostic end-to-end tests (Kind emulated or OpenShift), with smoke and full tiers
# Run all unit tests
make test
# Run with coverage report
go test -cover ./...
# Run specific package
go test ./pkg/solver/...
go test ./pkg/analyzer/...
# Run with verbose output
go test -v ./internal/controller/...
# Generate HTML coverage report
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.htmlUnit tests are co-located with the code they test:
internal/
├── controller/
│ ├── variantautoscaling_controller.go
│ └── variantautoscaling_controller_test.go
├── saturation/
│ ├── analyzer.go
│ └── analyzer_test.go
└── collector/
├── collector.go
└── collector_test.go
pkg/
└── solver/
├── optimizer.go
├── optimizer_test.go
├── solver.go
└── solver_test.go
Example unit test structure:
package solver_test
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestSolver(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Solver Suite")
}
var _ = Describe("Solver", func() {
Context("when optimizing single variant", func() {
It("should calculate optimal replicas", func() {
// Test implementation
Expect(result).To(Equal(expected))
})
})
})- Use table-driven tests for testing multiple scenarios
- Mock external dependencies (Kubernetes API, Prometheus, etc.)
- Test edge cases (zero values, negative numbers, nil pointers, etc.)
- Keep tests fast - unit tests should run in milliseconds
- Use descriptive test names - clearly state what is being tested
- Follow AAA pattern - Arrange, Act, Assert
Integration tests validate component interactions within the controller using envtest.
# Run integration tests (included in make test)
make test
# Run only controller integration tests
go test ./internal/controller/... -vIntegration tests use controller-runtime's envtest, which provides a real Kubernetes API server for testing:
var _ = BeforeSuite(func() {
testEnv = &envtest.Environment{
CRDDirectoryPaths: []string{
filepath.Join("..", "..", "config", "crd", "bases"),
},
}
cfg, err := testEnv.Start()
Expect(err).NotTo(HaveOccurred())
k8sClient, err = client.New(cfg, client.Options{Scheme: scheme})
Expect(err).NotTo(HaveOccurred())
})
var _ = AfterSuite(func() {
Expect(testEnv.Stop()).To(Succeed())
})WVA provides a single consolidated E2E suite that runs on multiple environments (Kind with emulated GPUs, or OpenShift/kubernetes with real infrastructure). Tests are environment-agnostic and parameterized via environment variables; they create VA, HPA, and model services dynamically as part of the test workflow.
- Location:
test/e2e/ - Environments: Kind (emulated), OpenShift, or generic Kubernetes
- Tiers: Smoke (~5–10 min) for PRs; full suite (~15–25 min) for comprehensive validation
Tests expect only the WVA controller and llm-d infrastructure to be deployed; they create VariantAutoscaling resources, HPAs, and model services themselves. Use the install script in infra-only mode:
# From repository root: deploy only WVA + llm-d infrastructure (no VA/HPA/model services)
cd deploy
export ENVIRONMENT="kind-emulator" # or "openshift", "kubernetes"
export INFRA_ONLY=true
./install.sh
# Or: ./install.sh --infra-onlyThis deploys:
- WVA controller
- llm-d infrastructure (Gateway, CRDs, RBAC, EPP)
- Prometheus stack and Prometheus Adapter (or KEDA when
SCALER_BACKEND=keda) - No VariantAutoscaling, HPA, or model services (tests create these)
When E2E_TESTS_ENABLED=true (or ENABLE_SCALE_TO_ZERO=true), the deploy script also enables GIE queuing so scale-from-zero tests can run: it patches the EPP with ENABLE_EXPERIMENTAL_FLOW_CONTROL_LAYER=true and applies an InferenceObjective (e2e-default) that references the default InferencePool. This ensures the metric inference_extension_flow_control_queue_size is populated when requests hit the gateway.
Alternatively, use the Makefile to deploy infra and run tests in one go:
# Kind: create cluster, deploy infra, run smoke tests
make test-e2e-smoke-with-setup
# Kind: deploy infra only (if cluster already exists), then run full suite
make deploy-e2e-infra
make test-e2e-fullSee the E2E Test Suite README for full configuration options and examples.
# Smoke tests (recommended for every PR)
make test-e2e-smoke
# Full suite (on-demand)
make test-e2e-full
# OpenShift: point at cluster and run
export KUBECONFIG=/path/to/openshift/kubeconfig
export ENVIRONMENT=openshift
make test-e2e-smoke
# or make test-e2e-full
# Run a specific test by name
FOCUS="Basic VA lifecycle" make test-e2e-smoke- Smoke (label
smoke): Infrastructure readiness, basic VA lifecycle, target condition validation - Full (label
full): Saturation scaling (single and multiple VAs), scale-from-zero, scale-to-zero (whenSCALE_TO_ZERO_ENABLED=true), limiter, pod scraping, parallel load scale-up
Key environment variables (see E2E Test Suite README for the full list):
| Variable | Default | Description |
|---|---|---|
ENVIRONMENT |
kind-emulator |
kind-emulator, openshift, or kubernetes |
USE_SIMULATOR |
true |
Emulated GPUs (true) or real vLLM (false) |
SCALE_TO_ZERO_ENABLED |
false |
Enable scale-to-zero tests (Kind supports both enabled and disabled) |
SCALER_BACKEND |
prometheus-adapter |
prometheus-adapter or keda (KEDA only for kind-emulator) |
REQUEST_RATE |
8 |
Load generation: requests per second |
NUM_PROMPTS |
1000 |
Load generation: total prompts |
For running multiple test runs in parallel, use multi-controller isolation (CONTROLLER_INSTANCE).
| Aspect | Unit Tests | Integration Tests | E2E Consolidated (Kind emulated) | E2E Consolidated (OpenShift) |
|---|---|---|---|---|
| Speed | Fast (<1min) | Fast (1-3min) | Smoke 5-10min / Full 15-25min | Smoke 5-10min / Full 15-25min |
| Isolation | Complete | Partial | Complete (Kind) | Shared cluster |
| GPU Required | No | No | No (emulated) | Yes (real) |
| Infrastructure | None | envtest | Kind + infra-only deploy | OpenShift + infra-only deploy |
| Realism | Low | Medium | High (emulated) | Production-like |
| CI-Friendly | Yes | Yes | Yes | Requires cluster |
| Local Dev | Yes | Yes | Yes | Cluster access needed |
WVA uses GitHub Actions for automated testing:
File: .github/workflows/ci-pr-checks.yaml
Runs on every pull request:
- Linting (golangci-lint)
- Unit tests
- Build verification
- Code coverage reporting
E2E workflows run the consolidated suite (test/e2e/):
- Smoke (
make test-e2e-smoke): Fast validation on Kind (or OpenShift whenENVIRONMENT=openshift) - Full (
make test-e2e-full): Full suite; typically run with infra deployed viadeploy-e2e-infraor equivalent
Infrastructure is deployed in infra-only mode (WVA + llm-d only); tests create VA, HPA, and model services dynamically.
File: .github/workflows/ci-e2e-openshift.yaml
Runs OpenShift E2E tests on dedicated cluster:
- Triggered manually or on specific labels
- Deploys PR-specific namespaces
- Runs multi-model tests
- On failure: automatically scales down GPU workloads while preserving debugging resources (VA, HPA, logs)
- Smart resource management frees GPUs for other PRs without manual intervention
You can trigger E2E runs by commenting on a PR:
| Comment | Workflow | Who can use | Effect |
|---|---|---|---|
/ok-to-test |
ci-pr-checks.yaml + ci-e2e-openshift.yaml |
Users with write access | Runs the full Kind E2E suite and the OpenShift E2E (GPU) run on this PR. On fork PRs, this is required before OpenShift E2E can run. |
/retest |
ci-e2e-openshift.yaml |
Users with write access | OpenShift E2E only: Re-run the OpenShift E2E workflow (e.g. after a failure, flake, or new commits). Same workflow as /ok-to-test, different trigger intent. |
When to use:
/ok-to-test: Comment this when you want the full E2E suite to run on your PR. It triggers both the full Kind E2E (instead of smoke only) and the OpenShift E2E. By default, PRs only run smoke E2E on Kind./retest: Use to re-run only the OpenShift E2E workflow (e.g. after a failure or new commits).- Fork PRs: If you opened a PR from a fork, OpenShift E2E will not run until a maintainer or admin comments
/ok-to-test. Branch protection should require the e2e-openshift status check so merge stays blocked until that run passes (the gate check is intentionally green on fork PRs to avoid a false failure that cannot be updated from upstream).
# Run linter
make lint
# Run unit tests
make test
# Build binary
make build
# Build Docker image
make docker-build# Deploy infra (infra-only), then run smoke or full suite
make deploy-e2e-infra
make test-e2e-smoke
# or: make test-e2e-full
# One-shot: create cluster, deploy infra, run smoke tests
make test-e2e-smoke-with-setup- Write tests first (TDD approach) - helps design better APIs
- Test behavior, not implementation - tests should survive refactoring
- Keep tests independent - tests should not depend on each other
- Use meaningful assertions - prefer specific matchers over generic equality
- Clean up resources - always clean up in AfterEach/AfterAll blocks
- Document complex tests - add comments explaining non-obvious test logic
// ✅ Good
It("should recommend scale-up when KV cache exceeds 70% threshold", func() {
// ...
})
// ❌ Bad
It("should work", func() {
// ...
})// ✅ Good - waits for condition to become true
Eventually(func(g Gomega) {
va := &v1alpha1.VariantAutoscaling{}
err := k8sClient.Get(ctx, key, va)
g.Expect(err).NotTo(HaveOccurred())
g.Expect(va.Status.DesiredOptimizedAlloc.NumReplicas).To(BeNumerically(">=", 2))
}, timeout, interval).Should(Succeed())
// ❌ Bad - may fail due to timing
va := &v1alpha1.VariantAutoscaling{}
k8sClient.Get(ctx, key, va)
Expect(va.Status.DesiredOptimizedAlloc.NumReplicas).To(BeNumerically(">=", 2))// Verify replicas remain stable for 30 seconds
Consistently(func(g Gomega) {
deploy := &appsv1.Deployment{}
err := k8sClient.Get(ctx, key, deploy)
g.Expect(err).NotTo(HaveOccurred())
g.Expect(*deploy.Spec.Replicas).To(Equal(int32(2)))
}, 30*time.Second, 5*time.Second).Should(Succeed())var _ = Describe("Scale-up workflow", Ordered, func() {
// These tests run in order and share state
It("should create resources", func() { /* ... */ })
It("should detect saturation", func() { /* ... */ })
It("should scale up", func() { /* ... */ })
})var _ = Describe("Optimizer", func() {
Context("with single variant", func() {
It("should optimize for cost", func() { /* ... */ })
It("should meet SLO requirements", func() { /* ... */ })
})
Context("with multiple variants", func() {
It("should prefer cheaper variant", func() { /* ... */ })
It("should distribute load evenly", func() { /* ... */ })
})
})var _ = Describe("Controller", func() {
var (
namespace string
cleanup func()
)
BeforeEach(func() {
namespace = "test-" + randomString()
// Setup test resources
})
AfterEach(func() {
// Clean up test resources
if cleanup != nil {
cleanup()
}
})
It("should reconcile resources", func() {
// Test implementation
})
})# Run with verbose output
go test -v ./pkg/solver/...
# Run specific test
go test -v ./pkg/solver/... -run TestSolver/should_optimize
# Enable Ginkgo trace
go test -v ./pkg/analyzer/... -ginkgo.trace
# Run with debugger (delve)
dlv test ./internal/controller/... -- -ginkgo.v# Consolidated E2E suite (smoke or full)
go test ./test/e2e/ -v -ginkgo.v -ginkgo.label-filter="smoke"
go test ./test/e2e/ -v -ginkgo.v -ginkgo.label-filter="full && !flaky" -timeout 35m# For Kind E2E tests (default cluster name: kind-wva-gpu-cluster or from CLUSTER_NAME)
export KUBECONFIG=~/.kube/config # or path from kind get kubeconfig
kubectl get pods -A
kubectl logs -n workload-variant-autoscaler-system deployment/workload-variant-autoscaler-controller-manager
# For OpenShift E2E tests
oc get pods -A
oc logs -n workload-variant-autoscaler-system deployment/workload-variant-autoscaler-controller-manager# Run tests; on failure, cluster is kept by default (DELETE_CLUSTER=false)
make test-e2e-smoke-with-setup
# Inspect: kubectl get all -A
# To delete cluster after: DELETE_CLUSTER=true make test-e2e-smoke-with-setup
# Or manually: kind delete cluster --name <CLUSTER_NAME>Symptoms: Test hangs or exceeds timeout
Possible causes:
- Controller stuck in reconciliation loop
- HPA not reading metrics
- Prometheus not scraping metrics
- Resource quotas preventing pod creation
Debugging steps:
kubectl get events -A --sort-by='.lastTimestamp'
kubectl describe va -n <namespace>
kubectl logs -n workload-variant-autoscaler-system deployment/workload-variant-autoscaler-controller-managerSymptoms: External metrics API returns empty or error
Possible causes:
- Prometheus adapter not running
- Metrics not being scraped
- Incorrect metric labels or selectors
Debugging steps:
# Check external metrics API
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/<namespace>/wva_desired_replicas" | jq
# Check Prometheus
kubectl port-forward -n workload-variant-autoscaler-monitoring svc/prometheus-operated 9090:9090
# Query: wva_desired_replicas{variant_name="..."}Symptoms: HPA shows desired replicas but deployment doesn't scale
Possible causes:
- Resource constraints (CPU/memory/GPU)
- Node capacity exceeded
- PDB preventing scale-up
- Deployment controller issues
Debugging steps:
kubectl describe hpa -n <namespace>
kubectl describe deploy -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
kubectl top nodesFor load testing, use the consolidated E2E suite with custom load parameters:
# Kind (emulated): low / medium / heavy load
REQUEST_RATE=8 NUM_PROMPTS=2000 make test-e2e-full
REQUEST_RATE=20 NUM_PROMPTS=3000 make test-e2e-full
REQUEST_RATE=40 NUM_PROMPTS=5000 make test-e2e-full
# OpenShift (real cluster)
export ENVIRONMENT=openshift
REQUEST_RATE=20 NUM_PROMPTS=3000 make test-e2e-fullTest system behavior under extreme conditions:
- High request rates (50+ req/s)
- Long-running load (30+ minutes)
- Rapid load changes
- Multiple concurrent variants
Current coverage targets:
- Unit tests: 70%+ code coverage
- Integration tests: All controller operations
- E2E tests: Critical user workflows
# Generate coverage report
go test -coverprofile=coverage.out ./...
# View summary
go tool cover -func=coverage.out
# Generate HTML report
go tool cover -html=coverage.out -o coverage.html
# View in browser
open coverage.html # macOS
xdg-open coverage.html # LinuxWhen contributing, please ensure:
- ✅ All new code has unit tests - aim for 70%+ coverage
- ✅ Critical paths have integration tests - especially controller logic
- ✅ New features have E2E tests - validate end-to-end behavior
- ✅ Tests are documented - explain what is being tested and why
- ✅ Tests follow naming conventions - use descriptive names
- ✅ Tests clean up resources - no resource leaks in tests
- ✅ Tests pass locally before pushing - run
make testandmake test-e2e-smoke(ormake test-e2e-full)
- Development Guide - Development environment setup
- E2E Test Suite README - Consolidated E2E tests (Kind, OpenShift, infra-only setup)
- Contributing Guide - Contribution guidelines