Skip to content

api: add minReplicas, maxReplicas and behavior fields to VA spec#864

Merged
asm582 merged 5 commits intollm-d:mainfrom
vivekk16:api/va-crd-restructure
Mar 13, 2026
Merged

api: add minReplicas, maxReplicas and behavior fields to VA spec#864
asm582 merged 5 commits intollm-d:mainfrom
vivekk16:api/va-crd-restructure

Conversation

@vivekk16
Copy link
Copy Markdown
Contributor

@vivekk16 vivekk16 commented Mar 9, 2026

Summary

Closes #807

Adds two new fields to the VariantAutoscaling spec:

  • minReplicas — optional, lower bound on replicas (0 enables scale-to-zero, defaults to 1)
  • maxReplicas — required in schema, defaults to 10 if omitted

minReplicas and maxReplicas are added to VariantAutoscalingSpec directly since they define the scaling bounds for this variant.

Changes

  • api/v1alpha1/variantautoscaling_types.go — add minReplicas, maxReplicas to VariantAutoscalingSpec;
  • config/crd/bases/llmd.ai_variantautoscalings.yaml — regenerated
  • charts/workload-variant-autoscaler/crds/llmd.ai_variantautoscalings.yaml — synced
  • api/v1alpha1/variantautoscaling_types_test.go — add TestMinMaxReplicasJSON; update makeValidVA() with MaxReplicas: 2
  • internal/actuator/actuator_test.go, internal/controller/variantautoscaling_controller_test.go, internal/controller/indexers/indexers_test.go, internal/engines/saturation/engine_test.go, test/e2e/fixtures/va_builder.go — set explicit MaxReplicas: 2 in test fixtures so they pass Minimum=1 validation in envtest (kubebuilder defaults are not applied by envtest)

No breaking changes

All fields have defaults. Existing VAs without them continue to work unchanged.

@vivekk16 vivekk16 marked this pull request as draft March 9, 2026 22:49
@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Mar 10, 2026

Can we be conservative and default maxReplicas to 2?

Comment thread api/v1alpha1/variantautoscaling_types.go Outdated
Comment thread api/v1alpha1/variantautoscaling_types.go Outdated
asm582
asm582 previously requested changes Mar 10, 2026
Copy link
Copy Markdown
Collaborator

@asm582 asm582 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address review

Comment thread api/v1alpha1/variantautoscaling_types.go Outdated
@ev-shindin
Copy link
Copy Markdown
Collaborator

/trigger-e2e-full

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

@ev-shindin ev-shindin marked this pull request as ready for review March 10, 2026 17:02
@lionelvillard
Copy link
Copy Markdown
Collaborator

LGTM.

/hold

waiting for the next PR managing HPA objects.

@github-actions github-actions bot added the hold PRs that are blocked on design, other features, release cycle, etc. label Mar 10, 2026
@lionelvillard
Copy link
Copy Markdown
Collaborator

/lgtm

@github-actions github-actions bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Mar 10, 2026
@ev-shindin ev-shindin removed the hold PRs that are blocked on design, other features, release cycle, etc. label Mar 10, 2026
@ev-shindin
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@vivekk16 vivekk16 force-pushed the api/va-crd-restructure branch from 29656c9 to 79cb5fd Compare March 10, 2026 21:09
@github-actions
Copy link
Copy Markdown
Contributor

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

@vivekk16 vivekk16 force-pushed the api/va-crd-restructure branch 2 times, most recently from 81d9076 to 156c8bb Compare March 10, 2026 21:29
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
…ng policies

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
…sts for minReplicas, maxReplicas, and behavior

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
@vivekk16 vivekk16 force-pushed the api/va-crd-restructure branch from 156c8bb to 807e467 Compare March 10, 2026 21:30
@ev-shindin
Copy link
Copy Markdown
Collaborator

/trigger-e2e-full

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

@ev-shindin
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 13 37
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@ev-shindin
Copy link
Copy Markdown
Collaborator

/lgtm

ev-shindin
ev-shindin previously approved these changes Mar 11, 2026
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
@vivekk16 vivekk16 force-pushed the api/va-crd-restructure branch from 6e68947 to c6998a0 Compare March 11, 2026 22:00
@vivekk16 vivekk16 requested a review from ev-shindin March 11, 2026 22:02
@lionelvillard
Copy link
Copy Markdown
Collaborator

/trigger-e2e-full

@lionelvillard
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 15 35
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

Copy link
Copy Markdown
Collaborator

@lionelvillard lionelvillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks @vivekk16

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Mar 12, 2026

/lgtm
/approve

@asm582 asm582 dismissed their stale review March 13, 2026 01:26

Adressed

@asm582 asm582 merged commit 104073d into llm-d:main Mar 13, 2026
16 checks passed
zdtsw pushed a commit to zdtsw-forking/workload-variant-autoscaler that referenced this pull request Mar 13, 2026
…-d#864)

* api: add minReplicas and maxReplicas to VariantAutoscalingSpec

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

* api: add behavior field to VariantAutoscalingConfigSpec for HPA scaling policies

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

* test: fix VA fixtures for maxReplicas validation and add CRD field tests for minReplicas, maxReplicas, and behavior

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

* refactor(api): change default maxReplicas from 10 to 2

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

* refactor(api): remove behavior field to align with release plan

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>

---------

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
ev-shindin added a commit to ev-shindin/workload-variant-autoscaler that referenced this pull request Mar 13, 2026
Remove annotation-based min/max replica bounds (wva.llmd.ai/min-replicas,
wva.llmd.ai/max-replicas) and read directly from VA spec.MinReplicas and
spec.MaxReplicas fields added in llm-d#864. This eliminates the annotation
parsing layer and aligns with the CRD as the single source of truth.
lionelvillard pushed a commit that referenced this pull request Mar 13, 2026
* feat: add min/max replicas as VA annotations with optimizer integration

Add per-variant min/max replica bounds via VA annotations
(wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and integrate
them into both V1 and V2 scaling paths:

- Parse bounds from VA annotations in BuildVariantStates
- Respect maxReplicas in CostAwareOptimizer (spillover to next variant)
- Respect minReplicas in costAwareScaleDown (hard floor per variant)
- Respect maxReplicas in GreedyBySaturationOptimizer allocateForModel
- Respect min/max in V1 limiter allocateForDecision
- Clamp targets in V1 CalculateSaturationTargets
- Disable scale-to-zero enforcement when any variant has minReplicas > 0
- Propagate bounds through VariantDecision for observability

* refactor: use VA spec fields for min/max replicas instead of annotations

Remove annotation-based min/max replica bounds (wva.llmd.ai/min-replicas,
wva.llmd.ai/max-replicas) and read directly from VA spec.MinReplicas and
spec.MaxReplicas fields added in #864. This eliminates the annotation
parsing layer and aligns with the CRD as the single source of truth.

* fix(e2e): align VA min/max replicas with test expectations

- Set explicit MinReplicas=1 and MaxReplicas=10 in VA builder defaults
  (was implicit MinReplicas via kubebuilder default and MaxReplicas=2)
- Add VAOption functional options (WithMinReplicas, WithMaxReplicas) for
  tests that need custom replica bounds
- Scale-to-zero and scale-from-zero tests now create VAs with
  MinReplicas=0 so the engine allows scaling to zero replicas
- MaxReplicas raised from 2 to 10 to match HPA maxReplicas and avoid
  artificially capping scale-up in load tests

* fix: address PR review — enforce minReplicas in GreedyByScore scale-down

- Pass stateMap to costAwareScaleDown in GreedyByScoreOptimizer so
  minReplicas is respected during scale-down (was missing)
- Update doc comments: "VA annotation" → "VA spec field" in
  VariantDecision, VariantReplicaState, and saturation analyzer
- Add tests verifying mixed-minReplicas behavior: variant with
  minReplicas=0 scales to zero while sibling with minReplicas>0
  is preserved (CostAware and GreedyByScore)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Looks good to me, indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add minReplicas and maxReplicas to VA Spec

4 participants