Skip to content

EPIC: document existing e2es and extend e2es to close gaps #963

@mamy-CS

Description

@mamy-CS

These are the existing e2es as of April 2nd, 2026 in the repo:

Note:

  • Tests create and tear down their own VA, HPA, or KEDA ScaledObject, model workloads, routes/monitors as needed, not the chart sample VA/HPA
  • The same spec binaries are meant to run on real GPU / real model server setups with different ENV / USE_SIMULATOR / scaler backend, not a separate test codebase.

Different run setups
Smoke: fast PR-style pass on a subset of specs.
Full: superset for deeper checks; includes smoke-overlapping areas plus full-only flows.

Existing e2es (by theme)

  • Infra readiness - controller up, llm-d pieces present, metrics scraping and external/custom metrics plumbing basically working.
  • Basic VA lifecycle (Deployment / LWS) - create VA + target workload plumbing, check reconciliation and status conditions (e.g., target resolved, metrics availability).
  • Workload shape coverage - same “VA lifecycle” ideas extended to Deployment and LeaderWorkerSet (including single-node LWS variants where applicable).
  • Resilience/errors - e.g., target workload churn while VA exists; metrics gaps reflected in conditions rather than silent failure.
  • Scale-from-zero - minReplicas 0, idle at zero, traffic via gateway → EPP / flow-control path, assert scale-up and request progress (with KEDA vs HPA depending on config); LWS variants mirror the same theme for leader/worker targets.
  • Accelerator/limiter - multiple pools / VAs with different accelerator constraints, assert routing/limiting behavior matches expectations.
  • Metrics collection - EPP pod scraping path (PodScrapingSource); some parts skipped or constrained on Kind where in-cluster behavior differs.
  • Saturation path:
    • analyzer/saturation wiring and status propagation (bounded, deterministic traffic, not benchmark throughput tests).
    • bounded V1 threshold-crossing traffic (KV/queue-style thresholds in config) and a below-threshold negative check.
    • Also checks status propagation (DesiredOptimizedAlloc, MetricsAvailable) for that dedicated model/VA

To document the current CI e2es time (local e2e runs seem a bit faster):

  • Smoke tests: 5-10 mns
  • Full kind emulated tests: 15-25 mins
  • Full openshift real GPU production line tests: 15-22 mins

Find/ add discussions for future e2es below in the issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions