These are the existing e2es as of April 2nd, 2026 in the repo:
Note:
- Tests create and tear down their own VA, HPA, or KEDA ScaledObject, model workloads, routes/monitors as needed, not the chart sample VA/HPA
- The same spec binaries are meant to run on real GPU / real model server setups with different
ENV / USE_SIMULATOR / scaler backend, not a separate test codebase.
Different run setups
Smoke: fast PR-style pass on a subset of specs.
Full: superset for deeper checks; includes smoke-overlapping areas plus full-only flows.
Existing e2es (by theme)
- Infra readiness - controller up, llm-d pieces present, metrics scraping and external/custom metrics plumbing basically working.
- Basic VA lifecycle (Deployment / LWS) - create VA + target workload plumbing, check reconciliation and status conditions (e.g., target resolved, metrics availability).
- Workload shape coverage - same “VA lifecycle” ideas extended to
Deployment and LeaderWorkerSet (including single-node LWS variants where applicable).
- Resilience/errors - e.g., target workload churn while VA exists; metrics gaps reflected in conditions rather than silent failure.
- Scale-from-zero - minReplicas 0, idle at zero, traffic via gateway → EPP / flow-control path, assert scale-up and request progress (with KEDA vs HPA depending on config); LWS variants mirror the same theme for leader/worker targets.
- Accelerator/limiter - multiple pools / VAs with different accelerator constraints, assert routing/limiting behavior matches expectations.
- Metrics collection - EPP pod scraping path (PodScrapingSource); some parts skipped or constrained on Kind where in-cluster behavior differs.
- Saturation path:
- analyzer/saturation wiring and status propagation (bounded, deterministic traffic, not benchmark throughput tests).
- bounded V1 threshold-crossing traffic (KV/queue-style thresholds in config) and a below-threshold negative check.
- Also checks status propagation (DesiredOptimizedAlloc, MetricsAvailable) for that dedicated model/VA
To document the current CI e2es time (local e2e runs seem a bit faster):
- Smoke tests: 5-10 mns
- Full kind emulated tests: 15-25 mins
- Full openshift real GPU production line tests: 15-22 mins
Find/ add discussions for future e2es below in the issue
These are the existing e2es as of April 2nd, 2026 in the repo:
Note:
ENV / USE_SIMULATOR / scaler backend, not a separate test codebase.Different run setups
Smoke: fast PR-style pass on a subset of specs.
Full: superset for deeper checks; includes smoke-overlapping areas plus full-only flows.
Existing e2es (by theme)
DeploymentandLeaderWorkerSet(including single-node LWS variants where applicable).To document the current CI e2es time (local e2e runs seem a bit faster):
Find/ add discussions for future e2es below in the issue