Skip to content

Commit a1df8e7

Browse files
Update Docs Botgithub-actions[bot]
authored andcommitted
docs: Remove references to deprecated Allocation API types
Remove documentation for CurrentAlloc, Allocation, and LoadProfile types that were removed in PR #553 as part of the model-based scaling cleanup. Changes: - Remove Allocation and LoadProfile type definitions from CRD reference - Remove currentAlloc field from VariantAutoscalingStatus documentation - Update status examples to use desiredOptimizedAlloc instead - Update VariantAutoscalingStatus description to reflect current state These types were part of the model-based scaling implementation and are no longer needed for saturation-based scaling.
1 parent deeb63f commit a1df8e7

3 files changed

Lines changed: 4 additions & 49 deletions

File tree

docs/design/controller-behavior.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ status:
6464
conditions:
6565
- type: Ready
6666
status: "True"
67-
currentAllocation:
67+
desiredOptimizedAlloc:
6868
numReplicas: 3
6969
accelerator: "A100"
7070

@@ -75,8 +75,7 @@ status:
7575
status: "False"
7676
reason: "DeploymentNotFound"
7777
message: "Target deployment no longer exists"
78-
currentAllocation:
79-
numReplicas: 0
78+
desiredOptimizedAlloc: {}
8079
```
8180
8281
When a Deployment is deleted:

docs/user-guide/configuration.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,7 @@ status:
6464
status: "False"
6565
reason: "DeploymentNotFound"
6666
message: "Target deployment 'llama-8b' no longer exists"
67-
currentAllocation:
68-
numReplicas: 0 # Cleared to reflect no deployment
67+
desiredOptimizedAlloc: {} # Cleared to reflect no deployment
6968
```
7069
7170
**Recovery Process:**

docs/user-guide/crd-reference.md

Lines changed: 1 addition & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -30,48 +30,6 @@ _Appears in:_
3030
| `applied` _boolean_ | Applied indicates whether the actuation was successfully applied. | | |
3131

3232

33-
#### Allocation
34-
35-
36-
37-
Allocation describes the current resource allocation for a model variant.
38-
39-
40-
41-
_Appears in:_
42-
- [VariantAutoscalingStatus](#variantautoscalingstatus)
43-
44-
| Field | Description | Default | Validation |
45-
| --- | --- | --- | --- |
46-
| `accelerator` _string_ | Accelerator is the type of accelerator currently allocated. | | MinLength: 1 <br /> |
47-
| `numReplicas` _integer_ | NumReplicas is the number of replicas currently allocated. | | Minimum: 0 <br /> |
48-
| `maxBatch` _integer_ | MaxBatch is the maximum batch size currently allocated. | | Minimum: 0 <br /> |
49-
| `itlAverage` _string_ | ITLAverage is the average inter token latency for the current allocation. | | Pattern: `^\d+(\.\d+)?$` <br /> |
50-
| `ttftAverage` _string_ | TTFTAverage is the average time to first token for the current allocation | | Pattern: `^\d+(\.\d+)?$` <br /> |
51-
| `load` _[LoadProfile](#loadprofile)_ | Load describes the workload characteristics for the current allocation. | | |
52-
53-
54-
#### LoadProfile
55-
56-
57-
58-
LoadProfile represents the configuration for workload characteristics,
59-
including the rate of incoming requests (ArrivalRate) and the average
60-
length of each request (AvgLength). Both fields are specified as strings
61-
to allow flexible input formats.
62-
63-
64-
65-
_Appears in:_
66-
- [Allocation](#allocation)
67-
68-
| Field | Description | Default | Validation |
69-
| --- | --- | --- | --- |
70-
| `arrivalRate` _string_ | ArrivalRate is the rate of incoming requests in inference server. | | |
71-
| `avgInputTokens` _string_ | AvgInputTokens is the average number of input(prefill) tokens per request in inference server. | | |
72-
| `avgOutputTokens` _string_ | AvgOutputTokens is the average number of output(decode) tokens per request in inference server. | | |
73-
74-
7533
#### OptimizedAlloc
7634

7735

@@ -156,7 +114,7 @@ _Appears in:_
156114

157115

158116
VariantAutoscalingStatus represents the current status of autoscaling for a variant,
159-
including the current allocation, desired optimized allocation, and actuation status.
117+
including the desired optimized allocation and actuation status.
160118

161119

162120

@@ -165,7 +123,6 @@ _Appears in:_
165123

166124
| Field | Description | Default | Validation |
167125
| --- | --- | --- | --- |
168-
| `currentAlloc` _[Allocation](#allocation)_ | CurrentAlloc specifies the current resource allocation for the variant. | | Optional: \{\} <br /> |
169126
| `desiredOptimizedAlloc` _[OptimizedAlloc](#optimizedalloc)_ | DesiredOptimizedAlloc indicates the target optimized allocation based on autoscaling logic. | | |
170127
| `actuation` _[ActuationStatus](#actuationstatus)_ | Actuation provides details about the actuation process and its current status. | | |
171128
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.32/#condition-v1-meta) array_ | Conditions represent the latest available observations of the VariantAutoscaling's state | | Optional: \{\} <br /> |

0 commit comments

Comments
 (0)