You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
api: add minReplicas, maxReplicas and behavior fields to VA spec (#864)
* api: add minReplicas and maxReplicas to VariantAutoscalingSpec
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
* api: add behavior field to VariantAutoscalingConfigSpec for HPA scaling policies
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
* test: fix VA fixtures for maxReplicas validation and add CRD field tests for minReplicas, maxReplicas, and behavior
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
* refactor(api): change default maxReplicas from 10 to 2
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
* refactor(api): remove behavior field to align with release plan
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
---------
Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
Copy file name to clipboardExpand all lines: api/v1alpha1/variantautoscaling_types.go
+17Lines changed: 17 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,7 @@ type VariantAutoscalingConfigSpec struct {
17
17
}
18
18
19
19
// VariantAutoscalingSpec defines the desired state for autoscaling a model variant.
20
+
// +kubebuilder:validation:XValidation:rule="!has(self.minReplicas) || self.minReplicas <= self.maxReplicas",message="minReplicas must be less than or equal to maxReplicas"
20
21
typeVariantAutoscalingSpecstruct {
21
22
// ScaleTargetRef references the scalable resource to manage.
22
23
// This follows the same pattern as HorizontalPodAutoscaler.
@@ -28,6 +29,20 @@ type VariantAutoscalingSpec struct {
28
29
// +kubebuilder:validation:Required
29
30
ModelIDstring`json:"modelID"`
30
31
32
+
// MinReplicas is the lower bound on the number of replicas for this variant.
33
+
// A value of 0 enables scale-to-zero when the model is idle.
34
+
// Defaults to 1, preserving existing behavior for VAs that omit this field.
35
+
// +kubebuilder:validation:Minimum=0
36
+
// +kubebuilder:default=1
37
+
// +optional
38
+
MinReplicas*int32`json:"minReplicas,omitempty"`
39
+
40
+
// MaxReplicas is the upper bound on the number of replicas for this variant.
41
+
// The autoscaler will never scale beyond this value regardless of load.
42
+
// +kubebuilder:validation:Minimum=1
43
+
// +kubebuilder:default=2
44
+
MaxReplicasint32`json:"maxReplicas"`
45
+
31
46
// VariantAutoscalingConfigSpec holds optional tuning fields that integrators can embed.
Copy file name to clipboardExpand all lines: docs/user-guide/crd-reference.md
+27-7Lines changed: 27 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ _Appears in:_
45
45
| --- | --- | --- | --- |
46
46
|`lastRunTime`_[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#time-v1-meta)_| LastRunTime is the timestamp of the last optimization run. |||
47
47
|`accelerator`_string_| Accelerator is the type of accelerator for the optimized allocation. || MinLength: 2 <br /> |
48
-
|`numReplicas`_integer_| NumReplicas is the number of replicas for the optimized allocation. || Minimum: 1 <br /> |
48
+
|`numReplicas`_integer_| NumReplicas is the number of replicas for the optimized allocation. || Minimum: 0 <br /> |
49
49
50
50
51
51
#### VariantAutoscaling
@@ -64,13 +64,31 @@ _Appears in:_
64
64
| --- | --- | --- | --- |
65
65
|`apiVersion`_string_|`llmd.ai/v1alpha1`|||
66
66
|`kind`_string_|`VariantAutoscaling`|||
67
-
|`kind`_string_| Kind is a string value representing the REST resource this object represents.<br />Servers may infer this from the endpoint the client submits requests to.<br />Cannot be updated.<br />In CamelCase.<br />More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds|||
68
-
|`apiVersion`_string_| APIVersion defines the versioned schema of this representation of an object.<br />Servers should convert recognized schemas to the latest internal value, and<br />may reject unrecognized values.<br />More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources|||
67
+
|`kind`_string_| Kind is a string value representing the REST resource this object represents.<br />Servers may infer this from the endpoint the client submits requests to.<br />Cannot be updated.<br />In CamelCase.<br />More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds||Optional: \{\} <br />|
68
+
|`apiVersion`_string_| APIVersion defines the versioned schema of this representation of an object.<br />Servers should convert recognized schemas to the latest internal value, and<br />may reject unrecognized values.<br />More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources||Optional: \{\} <br />|
69
69
|`metadata`_[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#objectmeta-v1-meta)_| Refer to Kubernetes API documentation for fields of `metadata`. |||
70
70
|`spec`_[VariantAutoscalingSpec](#variantautoscalingspec)_| Spec defines the desired state for autoscaling the model variant. |||
71
71
|`status`_[VariantAutoscalingStatus](#variantautoscalingstatus)_| Status represents the current status of autoscaling for the model variant. |||
72
72
73
73
74
+
#### VariantAutoscalingConfigSpec
75
+
76
+
77
+
78
+
VariantAutoscalingConfigSpec holds the optional tuning fields for a VariantAutoscaling.
79
+
It is extracted as a standalone embeddable type so that higher-level controllers
80
+
(e.g. KServe) can inline it without duplicating field definitions.
|`variantCost`_string_| VariantCost specifies the cost per replica for this variant (used in saturation analysis). | 10.0 | Optional: \{\} <br />Pattern: `^\d+(\.\d+)?$` <br /> |
90
+
91
+
74
92
#### VariantAutoscalingList
75
93
76
94
@@ -85,8 +103,8 @@ VariantAutoscalingList contains a list of VariantAutoscaling resources.
85
103
| --- | --- | --- | --- |
86
104
|`apiVersion`_string_|`llmd.ai/v1alpha1`|||
87
105
|`kind`_string_|`VariantAutoscalingList`|||
88
-
|`kind`_string_| Kind is a string value representing the REST resource this object represents.<br />Servers may infer this from the endpoint the client submits requests to.<br />Cannot be updated.<br />In CamelCase.<br />More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds|||
89
-
|`apiVersion`_string_| APIVersion defines the versioned schema of this representation of an object.<br />Servers should convert recognized schemas to the latest internal value, and<br />may reject unrecognized values.<br />More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources|||
106
+
|`kind`_string_| Kind is a string value representing the REST resource this object represents.<br />Servers may infer this from the endpoint the client submits requests to.<br />Cannot be updated.<br />In CamelCase.<br />More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds||Optional: \{\} <br />|
107
+
|`apiVersion`_string_| APIVersion defines the versioned schema of this representation of an object.<br />Servers should convert recognized schemas to the latest internal value, and<br />may reject unrecognized values.<br />More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources||Optional: \{\} <br />|
90
108
|`metadata`_[ListMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#listmeta-v1-meta)_| Refer to Kubernetes API documentation for fields of `metadata`. |||
91
109
|`items`_[VariantAutoscaling](#variantautoscaling) array_| Items is the list of VariantAutoscaling resources. |||
92
110
@@ -104,8 +122,10 @@ _Appears in:_
104
122
105
123
| Field | Description | Default | Validation |
106
124
| --- | --- | --- | --- |
107
-
|`scaleTargetRef`_[CrossVersionObjectReference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#crossversionobjectreference-v1-autoscaling)_| ScaleTargetRef references the scalable resource to manage.<br />This follows the same pattern as HorizontalPodAutoscaler. || Required: \{\} <br /> |
125
+
|`scaleTargetRef`_[CrossVersionObjectReference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#crossversionobjectreference-v2-autoscaling)_| ScaleTargetRef references the scalable resource to manage.<br />This follows the same pattern as HorizontalPodAutoscaler. || Required: \{\} <br /> |
108
126
|`modelID`_string_| ModelID specifies the unique identifier of the model to be autoscaled. || MinLength: 1 <br />Required: \{\} <br /> |
127
+
|`minReplicas`_integer_| MinReplicas is the lower bound on the number of replicas for this variant.<br />A value of 0 enables scale-to-zero when the model is idle.<br />Defaults to 1, preserving existing behavior for VAs that omit this field. | 1 | Minimum: 0 <br />Optional: \{\} <br /> |
128
+
|`maxReplicas`_integer_| MaxReplicas is the upper bound on the number of replicas for this variant.<br />The autoscaler will never scale beyond this value regardless of load. | 2 | Minimum: 1 <br /> |
109
129
|`variantCost`_string_| VariantCost specifies the cost per replica for this variant (used in saturation analysis). | 10.0 | Optional: \{\} <br />Pattern: `^\d+(\.\d+)?$` <br /> |
110
130
111
131
@@ -114,7 +134,7 @@ _Appears in:_
114
134
115
135
116
136
VariantAutoscalingStatus represents the current status of autoscaling for a variant,
117
-
including the desired optimized allocation and actuation status.
137
+
including the current allocation, desired optimized allocation, and actuation status.
0 commit comments