You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The e2e suite deploys infra, creates resources, generates load, and validates scaling. No manual load tool needed.
198
+
The consolidated e2e suite (`test/e2e/`) exercises infra-only deploy, resource wiring, reconciliation, and deterministic correctness checks. For sustained load or benchmarking, use **Option B** or separate perf workflows — not required for e2e.
199
199
200
200
```bash
201
201
# From repo root, after deploying (e.g. make deploy-wva-emulated-on-kind)
202
202
make deploy-e2e-infra # if not already done
203
203
make test-e2e-smoke # quick validation
204
204
# or
205
-
make test-e2e-full # full suite including saturation scaling
205
+
make test-e2e-full # full suite (`full && !flaky`)
206
206
```
207
207
208
208
See [Testing Guide](../../docs/developer-guide/testing.md) and [E2E Test Suite README](../../test/e2e/README.md).
209
209
210
+
### 4. Generate Load
211
+
210
212
**Option B — Manual load with burst script**
211
213
Use the script in the e2e fixtures (requires only `curl`; no Python). After port-forwarding the inference gateway or vLLM service to `localhost:8000`:
212
214
@@ -221,7 +223,7 @@ export BATCH_SIZE=10
221
223
222
224
Tune load with `TOTAL_REQUESTS`, `BATCH_SIZE`, and optional `BATCH_SLEEP`, `MAX_TOKENS`, `CURL_TIMEOUT` (see script header).
Copy file name to clipboardExpand all lines: docs/developer-guide/testing.md
+17-26Lines changed: 17 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -140,6 +140,10 @@ WVA provides a **single consolidated E2E suite** that runs on multiple environme
140
140
-**Environments**: Kind (emulated), OpenShift, or generic Kubernetes
141
141
-**Tiers**: Smoke (~5–10 min) for PRs; full suite (~15–25 min) for comprehensive validation
142
142
143
+
### Scope
144
+
145
+
E2E is intended to be a **deterministic correctness signal**: resource wiring, reconciliation, and stable invariants (e.g., CRs reconcile, status conditions are set, scalers are created and point at the right targets/metrics). Traffic generation and performance/benchmarking scenarios should live outside `test/e2e/`.
146
+
143
147
### Infra-Only Setup (Required Before Running Tests)
144
148
145
149
Tests expect **only** the WVA controller and llm-d infrastructure to be deployed; they create VariantAutoscaling resources, HPAs, and model services themselves. Use the install script in **infra-only** mode:
@@ -161,6 +165,11 @@ This deploys:
161
165
162
166
When `E2E_TESTS_ENABLED=true` (or `ENABLE_SCALE_TO_ZERO=true`), the deploy script also enables **GIE queuing** so scale-from-zero tests can run: it patches the EPP with `ENABLE_EXPERIMENTAL_FLOW_CONTROL_LAYER=true` and applies an **InferenceObjective** (`e2e-default`) that references the default InferencePool. This ensures the metric `inference_extension_flow_control_queue_size` is populated when requests hit the gateway.
163
167
168
+
**Install script tuning (optional, same variables as `deploy/install.sh`):**
169
+
170
+
-**`SKIP_HELM_REPO_UPDATE`**: When set to **`true`**, `helm repo update` is skipped during installs (faster, less network churn). Default runs `helm repo update` to refresh repo indexes.
171
+
-**`E2E_DEPLOY_WAIT_TIMEOUT`**: For infra-only e2e deploys (`INFRA_ONLY=true` with `E2E_TESTS_ENABLED=true`), caps the `kubectl wait` for the EPP and inference-gateway deployments (default **`120s`**). Raise it if image pulls rollouts routinely exceed that window.
172
+
164
173
Alternatively, use the Makefile to deploy infra and run tests in one go:
165
174
166
175
```bash
@@ -196,7 +205,7 @@ FOCUS="Basic VA lifecycle" make test-e2e-smoke
-**Full (label `full`)**: Saturation scaling (single and multiple VAs), scale-from-zero, scale-to-zero (when `SCALE_TO_ZERO_ENABLED=true`), limiter, pod scraping, parallel load scale-up
208
+
-**Full (label `full`)**: Smoke plus additional deterministic correctness checks (scale-from-zero, limiter, pod scraping, etc.)
200
209
201
210
### Configuration
202
211
@@ -208,8 +217,11 @@ Key environment variables (see [E2E Test Suite README](../../test/e2e/README.md)
208
217
|`USE_SIMULATOR`|`true`| Emulated GPUs (true) or real vLLM (false) |
209
218
|`SCALE_TO_ZERO_ENABLED`|`false`| Enable scale-to-zero tests (Kind supports both enabled and disabled) |
210
219
|`SCALER_BACKEND`|`prometheus-adapter`|`prometheus-adapter` or `keda` (KEDA only for kind-emulator) |
211
-
|`REQUEST_RATE`|`8`| Load generation: requests per second |
212
-
|`NUM_PROMPTS`|`1000`| Load generation: total prompts |
220
+
|`POD_READY_TIMEOUT` / `SCALE_UP_TIMEOUT`|`300` / `600`| Model ready vs longest scale/job waits (seconds) |
221
+
|`E2E_EVENTUALLY_STANDARD`, etc. | see README | Optional `Eventually` timeouts and poll intervals (`E2E_EVENTUALLY_*`, `E2E_EVENTUALLY_POLL*`) |
222
+
|`RESTART_PROMETHEUS_ADAPTER`|`auto`| kind-emulator: `auto` probes adapter + API before restarting pods; `true`/`false` force always/never |
223
+
224
+
Deploy-time knobs (passed through when you run `./deploy/install.sh` or `make deploy-e2e-infra`): `SKIP_HELM_REPO_UPDATE`, `E2E_DEPLOY_WAIT_TIMEOUT` — see **Install script tuning** above.
213
225
214
226
For running multiple test runs in parallel, use [multi-controller isolation](../user-guide/multi-controller-isolation.md) (`CONTROLLER_INSTANCE`).
215
227
@@ -524,30 +536,9 @@ kubectl get events -n <namespace> --sort-by='.lastTimestamp'
524
536
kubectl top nodes
525
537
```
526
538
527
-
## Performance Testing
528
-
529
-
### Load Testing
530
-
531
-
For load testing, use the consolidated E2E suite with custom load parameters:
532
-
533
-
```bash
534
-
# Kind (emulated): low / medium / heavy load
535
-
REQUEST_RATE=8 NUM_PROMPTS=2000 make test-e2e-full
536
-
REQUEST_RATE=20 NUM_PROMPTS=3000 make test-e2e-full
537
-
REQUEST_RATE=40 NUM_PROMPTS=5000 make test-e2e-full
538
-
539
-
# OpenShift (real cluster)
540
-
export ENVIRONMENT=openshift
541
-
REQUEST_RATE=20 NUM_PROMPTS=3000 make test-e2e-full
542
-
```
543
-
544
-
### Stress Testing
539
+
## Performance / Benchmarking
545
540
546
-
Test system behavior under extreme conditions:
547
-
- High request rates (50+ req/s)
548
-
- Long-running load (30+ minutes)
549
-
- Rapid load changes
550
-
- Multiple concurrent variants
541
+
Performance and benchmarking scenarios (traffic generation, throughput/latency measurement, scale-up latency, etc.) are intentionally **out of scope** for `test/e2e/` so that e2e remains deterministic. Use the project’s dedicated benchmarking tooling/workflows instead.
0 commit comments