api: add minReplicas, maxReplicas and behavior fields to VA spec#864
api: add minReplicas, maxReplicas and behavior fields to VA spec#864asm582 merged 5 commits intollm-d:mainfrom
Conversation
|
Can we be conservative and default maxReplicas to 2? |
|
/trigger-e2e-full |
|
🚀 Kind E2E (full) triggered by |
|
LGTM. /hold waiting for the next PR managing HPA objects. |
|
/lgtm |
|
/ok-to-test |
|
🚀 OpenShift E2E — approve and run ( |
29656c9 to
79cb5fd
Compare
|
Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
81d9076 to
156c8bb
Compare
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
…ng policies Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
…sts for minReplicas, maxReplicas, and behavior Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
156c8bb to
807e467
Compare
|
/trigger-e2e-full |
|
🚀 Kind E2E (full) triggered by |
|
/ok-to-test |
|
🚀 OpenShift E2E — approve and run ( |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
|
/lgtm |
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
6e68947 to
c6998a0
Compare
|
/trigger-e2e-full |
|
/ok-to-test |
|
🚀 Kind E2E (full) triggered by |
|
🚀 OpenShift E2E — approve and run ( |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
lionelvillard
left a comment
There was a problem hiding this comment.
/lgtm
Thanks @vivekk16
|
/lgtm |
…-d#864) * api: add minReplicas and maxReplicas to VariantAutoscalingSpec Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * api: add behavior field to VariantAutoscalingConfigSpec for HPA scaling policies Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * test: fix VA fixtures for maxReplicas validation and add CRD field tests for minReplicas, maxReplicas, and behavior Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * refactor(api): change default maxReplicas from 10 to 2 Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> * refactor(api): remove behavior field to align with release plan Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com> --------- Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
Remove annotation-based min/max replica bounds (wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and read directly from VA spec.MinReplicas and spec.MaxReplicas fields added in llm-d#864. This eliminates the annotation parsing layer and aligns with the CRD as the single source of truth.
* feat: add min/max replicas as VA annotations with optimizer integration Add per-variant min/max replica bounds via VA annotations (wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and integrate them into both V1 and V2 scaling paths: - Parse bounds from VA annotations in BuildVariantStates - Respect maxReplicas in CostAwareOptimizer (spillover to next variant) - Respect minReplicas in costAwareScaleDown (hard floor per variant) - Respect maxReplicas in GreedyBySaturationOptimizer allocateForModel - Respect min/max in V1 limiter allocateForDecision - Clamp targets in V1 CalculateSaturationTargets - Disable scale-to-zero enforcement when any variant has minReplicas > 0 - Propagate bounds through VariantDecision for observability * refactor: use VA spec fields for min/max replicas instead of annotations Remove annotation-based min/max replica bounds (wva.llmd.ai/min-replicas, wva.llmd.ai/max-replicas) and read directly from VA spec.MinReplicas and spec.MaxReplicas fields added in #864. This eliminates the annotation parsing layer and aligns with the CRD as the single source of truth. * fix(e2e): align VA min/max replicas with test expectations - Set explicit MinReplicas=1 and MaxReplicas=10 in VA builder defaults (was implicit MinReplicas via kubebuilder default and MaxReplicas=2) - Add VAOption functional options (WithMinReplicas, WithMaxReplicas) for tests that need custom replica bounds - Scale-to-zero and scale-from-zero tests now create VAs with MinReplicas=0 so the engine allows scaling to zero replicas - MaxReplicas raised from 2 to 10 to match HPA maxReplicas and avoid artificially capping scale-up in load tests * fix: address PR review — enforce minReplicas in GreedyByScore scale-down - Pass stateMap to costAwareScaleDown in GreedyByScoreOptimizer so minReplicas is respected during scale-down (was missing) - Update doc comments: "VA annotation" → "VA spec field" in VariantDecision, VariantReplicaState, and saturation analyzer - Add tests verifying mixed-minReplicas behavior: variant with minReplicas=0 scales to zero while sibling with minReplicas>0 is preserved (CostAware and GreedyByScore)
Summary
Closes #807
Adds two new fields to the
VariantAutoscalingspec:minReplicas— optional, lower bound on replicas (0 enables scale-to-zero, defaults to 1)maxReplicas— required in schema, defaults to 10 if omittedminReplicasandmaxReplicasare added toVariantAutoscalingSpecdirectly since they define the scaling bounds for this variant.Changes
api/v1alpha1/variantautoscaling_types.go— addminReplicas,maxReplicastoVariantAutoscalingSpec;config/crd/bases/llmd.ai_variantautoscalings.yaml— regeneratedcharts/workload-variant-autoscaler/crds/llmd.ai_variantautoscalings.yaml— syncedapi/v1alpha1/variantautoscaling_types_test.go— addTestMinMaxReplicasJSON; updatemakeValidVA()withMaxReplicas: 2internal/actuator/actuator_test.go,internal/controller/variantautoscaling_controller_test.go,internal/controller/indexers/indexers_test.go,internal/engines/saturation/engine_test.go,test/e2e/fixtures/va_builder.go— set explicitMaxReplicas: 2in test fixtures so they passMinimum=1validation in envtest (kubebuilder defaults are not applied by envtest)No breaking changes
All fields have defaults. Existing VAs without them continue to work unchanged.