EPIC: document existing e2es and extend e2es to close gaps

These are the existing e2es as of **April 2nd, 2026** in the repo:

**Note:** 
- Tests create and tear down their own VA, HPA, or KEDA ScaledObject, model workloads, routes/monitors as needed, not the chart sample VA/HPA
- The same spec binaries are meant to run on real GPU / real model server setups with different `ENV / USE_SIMULATOR / scaler backend`, not a separate test codebase.

**Different run setups**
**_Smoke_**: fast PR-style pass on a subset of specs.
**_Full_**: superset for deeper checks; includes smoke-overlapping areas plus full-only flows.

**Existing e2es (by theme)**
- Infra readiness - controller up, llm-d pieces present, metrics scraping and external/custom metrics plumbing basically working.
- Basic VA lifecycle (Deployment / LWS) - create VA + target workload plumbing, check reconciliation and status conditions (e.g., target resolved, metrics availability).
- Workload shape coverage - same “VA lifecycle” ideas extended to `Deployment` and `LeaderWorkerSet` (including single-node LWS variants where applicable).
- Resilience/errors - e.g., target workload churn while VA exists; metrics gaps reflected in conditions rather than silent failure.
- Scale-from-zero - minReplicas 0, idle at zero, traffic via gateway → EPP / flow-control path, assert scale-up and request progress (with KEDA vs HPA depending on config); LWS variants mirror the same theme for leader/worker targets.
- Accelerator/limiter - multiple pools / VAs with different accelerator constraints, assert routing/limiting behavior matches expectations.
- Metrics collection - EPP pod scraping path (PodScrapingSource); some parts skipped or constrained on Kind where in-cluster behavior differs.
- Saturation path:
    - analyzer/saturation wiring and status propagation (bounded, deterministic traffic, not benchmark throughput tests).
    - bounded V1 threshold-crossing traffic (KV/queue-style thresholds in config) and a below-threshold negative check.
    - Also checks status propagation (DesiredOptimizedAlloc, MetricsAvailable) for that dedicated model/VA

To document the current CI e2es time (local e2e runs seem a bit faster):
- Smoke tests: 5-10 mns
- Full kind emulated tests: 15-25 mins
- Full openshift real GPU production line tests: 15-22 mins

Find/ add discussions for future e2es below in the issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: document existing e2es and extend e2es to close gaps #963

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EPIC: document existing e2es and extend e2es to close gaps #963

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions