You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`accelerator`_string_| Accelerator is the type of accelerator currently allocated. || MinLength: 1 <br /> |
47
-
|`numReplicas`_integer_| NumReplicas is the number of replicas currently allocated. || Minimum: 0 <br /> |
48
-
|`maxBatch`_integer_| MaxBatch is the maximum batch size currently allocated. || Minimum: 0 <br /> |
49
-
|`itlAverage`_string_| ITLAverage is the average inter token latency for the current allocation. || Pattern: `^\d+(\.\d+)?$` <br /> |
50
-
|`ttftAverage`_string_| TTFTAverage is the average time to first token for the current allocation || Pattern: `^\d+(\.\d+)?$` <br /> |
51
-
|`load`_[LoadProfile](#loadprofile)_| Load describes the workload characteristics for the current allocation. |||
52
-
53
-
54
-
#### LoadProfile
55
-
56
-
57
-
58
-
LoadProfile represents the configuration for workload characteristics,
59
-
including the rate of incoming requests (ArrivalRate) and the average
60
-
length of each request (AvgLength). Both fields are specified as strings
61
-
to allow flexible input formats.
62
-
63
-
64
-
65
-
_Appears in:_
66
-
-[Allocation](#allocation)
67
-
68
-
| Field | Description | Default | Validation |
69
-
| --- | --- | --- | --- |
70
-
|`arrivalRate`_string_| ArrivalRate is the rate of incoming requests in inference server. |||
71
-
|`avgInputTokens`_string_| AvgInputTokens is the average number of input(prefill) tokens per request in inference server. |||
72
-
|`avgOutputTokens`_string_| AvgOutputTokens is the average number of output(decode) tokens per request in inference server. |||
73
-
74
-
75
33
#### OptimizedAlloc
76
34
77
35
@@ -156,7 +114,7 @@ _Appears in:_
156
114
157
115
158
116
VariantAutoscalingStatus represents the current status of autoscaling for a variant,
159
-
including the current allocation, desired optimized allocation, and actuation status.
117
+
including the desired optimized allocation and actuation status.
160
118
161
119
162
120
@@ -165,7 +123,6 @@ _Appears in:_
165
123
166
124
| Field | Description | Default | Validation |
167
125
| --- | --- | --- | --- |
168
-
|`currentAlloc`_[Allocation](#allocation)_| CurrentAlloc specifies the current resource allocation for the variant. || Optional: \{\} <br /> |
169
126
|`desiredOptimizedAlloc`_[OptimizedAlloc](#optimizedalloc)_| DesiredOptimizedAlloc indicates the target optimized allocation based on autoscaling logic. |||
170
127
|`actuation`_[ActuationStatus](#actuationstatus)_| Actuation provides details about the actuation process and its current status. |||
171
128
|`conditions`_[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#condition-v1-meta) array_| Conditions represent the latest available observations of the VariantAutoscaling's state || Optional: \{\} <br /> |
0 commit comments