api: add minReplicas, maxReplicas and behavior fields to VA spec by vivekk16 · Pull Request #864 · llm-d/llm-d-workload-variant-autoscaler

vivekk16 · 2026-03-09T22:48:37Z

Summary

Closes #807

Adds two new fields to the VariantAutoscaling spec:

minReplicas — optional, lower bound on replicas (0 enables scale-to-zero, defaults to 1)
maxReplicas — required in schema, defaults to 10 if omitted

minReplicas and maxReplicas are added to VariantAutoscalingSpec directly since they define the scaling bounds for this variant.

Changes

api/v1alpha1/variantautoscaling_types.go — add minReplicas, maxReplicas to VariantAutoscalingSpec;
config/crd/bases/llmd.ai_variantautoscalings.yaml — regenerated
charts/workload-variant-autoscaler/crds/llmd.ai_variantautoscalings.yaml — synced
api/v1alpha1/variantautoscaling_types_test.go — add TestMinMaxReplicasJSON; update makeValidVA() with MaxReplicas: 2
internal/actuator/actuator_test.go, internal/controller/variantautoscaling_controller_test.go, internal/controller/indexers/indexers_test.go, internal/engines/saturation/engine_test.go, test/e2e/fixtures/va_builder.go — set explicit MaxReplicas: 2 in test fixtures so they pass Minimum=1 validation in envtest (kubebuilder defaults are not applied by envtest)

No breaking changes

All fields have defaults. Existing VAs without them continue to work unchanged.

asm582 · 2026-03-10T01:53:17Z

Can we be conservative and default maxReplicas to 2?

asm582

Address review

ev-shindin · 2026-03-10T17:01:51Z

/trigger-e2e-full

github-actions · 2026-03-10T17:02:09Z

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

lionelvillard · 2026-03-10T18:15:54Z

LGTM.

/hold

waiting for the next PR managing HPA objects.

lionelvillard · 2026-03-10T18:33:20Z

/lgtm

ev-shindin · 2026-03-10T21:08:28Z

/ok-to-test

github-actions · 2026-03-10T21:08:37Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-03-10T21:09:24Z

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

…ng policies Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

…sts for minReplicas, maxReplicas, and behavior Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

ev-shindin · 2026-03-10T22:00:05Z

/trigger-e2e-full

github-actions · 2026-03-10T22:00:14Z

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

ev-shindin · 2026-03-11T07:32:55Z

/ok-to-test

github-actions · 2026-03-11T07:33:04Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-03-11T07:36:19Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	13	37

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

ev-shindin · 2026-03-11T08:14:50Z

/lgtm

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

lionelvillard · 2026-03-11T22:58:36Z

/trigger-e2e-full

lionelvillard · 2026-03-11T22:58:42Z

/ok-to-test

github-actions · 2026-03-11T22:58:46Z

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

github-actions · 2026-03-11T22:58:52Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-03-11T23:01:39Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	15	35

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

lionelvillard

/lgtm

Thanks @vivekk16

asm582 · 2026-03-12T19:20:43Z

/lgtm
/approve

Adressed

…-d#864) * api: add minReplicas and maxReplicas to VariantAutoscalingSpec Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * api: add behavior field to VariantAutoscalingConfigSpec for HPA scaling policies Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * test: fix VA fixtures for maxReplicas validation and add CRD field tests for minReplicas, maxReplicas, and behavior Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * refactor(api): change default maxReplicas from 10 to 2 Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * refactor(api): remove behavior field to align with release plan Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> --------- Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

Remove annotation-based min/max replica bounds (wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and read directly from VA spec.MinReplicas and spec.MaxReplicas fields added in llm-d#864. This eliminates the annotation parsing layer and aligns with the CRD as the single source of truth.

* feat: add min/max replicas as VA annotations with optimizer integration Add per-variant min/max replica bounds via VA annotations (wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and integrate them into both V1 and V2 scaling paths: - Parse bounds from VA annotations in BuildVariantStates - Respect maxReplicas in CostAwareOptimizer (spillover to next variant) - Respect minReplicas in costAwareScaleDown (hard floor per variant) - Respect maxReplicas in GreedyBySaturationOptimizer allocateForModel - Respect min/max in V1 limiter allocateForDecision - Clamp targets in V1 CalculateSaturationTargets - Disable scale-to-zero enforcement when any variant has minReplicas > 0 - Propagate bounds through VariantDecision for observability * refactor: use VA spec fields for min/max replicas instead of annotations Remove annotation-based min/max replica bounds (wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and read directly from VA spec.MinReplicas and spec.MaxReplicas fields added in #864. This eliminates the annotation parsing layer and aligns with the CRD as the single source of truth. * fix(e2e): align VA min/max replicas with test expectations - Set explicit MinReplicas=1 and MaxReplicas=10 in VA builder defaults (was implicit MinReplicas via kubebuilder default and MaxReplicas=2) - Add VAOption functional options (WithMinReplicas, WithMaxReplicas) for tests that need custom replica bounds - Scale-to-zero and scale-from-zero tests now create VAs with MinReplicas=0 so the engine allows scaling to zero replicas - MaxReplicas raised from 2 to 10 to match HPA maxReplicas and avoid artificially capping scale-up in load tests * fix: address PR review — enforce minReplicas in GreedyByScore scale-down - Pass stateMap to costAwareScaleDown in GreedyByScoreOptimizer so minReplicas is respected during scale-down (was missing) - Update doc comments: "VA annotation" → "VA spec field" in VariantDecision, VariantReplicaState, and saturation analyzer - Add tests verifying mixed-minReplicas behavior: variant with minReplicas=0 scales to zero while sibling with minReplicas>0 is preserved (CostAware and GreedyByScore)

vivekk16 marked this pull request as draft March 9, 2026 22:49

asm582 reviewed Mar 10, 2026

View reviewed changes

Comment thread api/v1alpha1/variantautoscaling_types.go Outdated

Comment thread api/v1alpha1/variantautoscaling_types.go Outdated

asm582 previously requested changes Mar 10, 2026

View reviewed changes

lionelvillard reviewed Mar 10, 2026

View reviewed changes

Comment thread api/v1alpha1/variantautoscaling_types.go Outdated

ev-shindin marked this pull request as ready for review March 10, 2026 17:02

vivekk16 requested review from asm582 and lionelvillard March 10, 2026 17:07

github-actions bot added the hold PRs that are blocked on design, other features, release cycle, etc. label Mar 10, 2026

github-actions bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Mar 10, 2026

ev-shindin removed the hold PRs that are blocked on design, other features, release cycle, etc. label Mar 10, 2026

vivekk16 force-pushed the api/va-crd-restructure branch from 29656c9 to 79cb5fd Compare March 10, 2026 21:09

vivekk16 force-pushed the api/va-crd-restructure branch 2 times, most recently from 81d9076 to 156c8bb Compare March 10, 2026 21:29

vivekk16 added 4 commits March 10, 2026 17:30

api: add minReplicas and maxReplicas to VariantAutoscalingSpec

db8d6fe

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

api: add behavior field to VariantAutoscalingConfigSpec for HPA scali…

79a0a80

…ng policies Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

test: fix VA fixtures for maxReplicas validation and add CRD field te…

da4bd4b

…sts for minReplicas, maxReplicas, and behavior Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

refactor(api): change default maxReplicas from 10 to 2

807e467

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

vivekk16 force-pushed the api/va-crd-restructure branch from 156c8bb to 807e467 Compare March 10, 2026 21:30

ev-shindin previously approved these changes Mar 11, 2026

View reviewed changes

lionelvillard added this to the v0.6.0 milestone Mar 11, 2026

vivekk16 dismissed ev-shindin’s stale review via 6e68947 March 11, 2026 21:56

refactor(api): remove behavior field to align with release plan

c6998a0

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

vivekk16 force-pushed the api/va-crd-restructure branch from 6e68947 to c6998a0 Compare March 11, 2026 22:00

vivekk16 requested a review from ev-shindin March 11, 2026 22:02

lionelvillard approved these changes Mar 11, 2026

View reviewed changes

github-actions bot approved these changes Mar 12, 2026

View reviewed changes

asm582 merged commit 104073d into llm-d:main Mar 13, 2026
16 checks passed

zdtsw mentioned this pull request Mar 13, 2026

[cherrypick] API change from upstream which should land in 0.6.0 opendatahub-io/workload-variant-autoscaler#37

Closed

3 tasks

clubanderson mentioned this pull request Mar 14, 2026

🐛 WVA nightly E2E failing on CKS and OCP after March 9-13 PR batch #884

Open

Conversation

vivekk16 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

No breaking changes

Uh oh!

asm582 commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

asm582 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ev-shindin commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

lionelvillard commented Mar 10, 2026

Uh oh!

lionelvillard commented Mar 10, 2026

Uh oh!

ev-shindin commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

ev-shindin commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

ev-shindin commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

GPU Pre-flight Check ✅

Uh oh!

ev-shindin commented Mar 11, 2026

Uh oh!

lionelvillard commented Mar 11, 2026

Uh oh!

lionelvillard commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

GPU Pre-flight Check ✅

Uh oh!

lionelvillard left a comment

Choose a reason for hiding this comment

Uh oh!

asm582 commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vivekk16 commented Mar 9, 2026 •

edited

Loading