Skip to content

Removed allocation struct used in modelbased scaling#553

Merged
asm582 merged 1 commit intollm-d:mainfrom
asm582:rem_allocation
Jan 9, 2026
Merged

Removed allocation struct used in modelbased scaling#553
asm582 merged 1 commit intollm-d:mainfrom
asm582:rem_allocation

Conversation

@asm582
Copy link
Copy Markdown
Collaborator

@asm582 asm582 commented Jan 9, 2026

allocation struct has ITL, TTFT, and load average fields. This was handled by model-based scaling, which is not required for saturation-based scaling and therefore is not part of the API.

// including the rate of incoming requests (ArrivalRate) and the average
// length of each request (AvgLength). Both fields are specified as strings
// to allow flexible input formats.
type LoadProfile struct {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still need this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backward compatibility from an algorithm perspective. I assume we will soon need features for model-based work, either to handle wobble or to go beyond it. If this is absolutely not needed, we can remove it later.

MaxBatch int `json:"maxBatch"`

// ITLAverage is the average inter token latency for the current allocation.
ITLAverage string `json:"itlAverage"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. Until the wobble issue is fixed, I would like to keep it.

ITLAverage string `json:"itlAverage"`

// TTFTAverage is the average time to first token for the current allocation
TTFTAverage string `json:"ttftAverage"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it as a safeguard in case we ever want to bring back model-based features. The user-facing API does not have these fields for now.

@asm582 asm582 marked this pull request as ready for review January 9, 2026 15:12
@asm582 asm582 merged commit deeb63f into llm-d:main Jan 9, 2026
7 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 9, 2026
Remove documentation for CurrentAlloc, Allocation, and LoadProfile types
that were removed in PR #553 as part of the model-based scaling cleanup.

Changes:
- Remove Allocation and LoadProfile type definitions from CRD reference
- Remove currentAlloc field from VariantAutoscalingStatus documentation
- Update status examples to use desiredOptimizedAlloc instead
- Update VariantAutoscalingStatus description to reflect current state

These types were part of the model-based scaling implementation and are
no longer needed for saturation-based scaling.
ev-shindin pushed a commit to ev-shindin/workload-variant-autoscaler that referenced this pull request Jan 14, 2026
Removed allocation struct used in modelbased scaling
ev-shindin pushed a commit to ev-shindin/workload-variant-autoscaler that referenced this pull request Jan 14, 2026
Remove documentation for CurrentAlloc, Allocation, and LoadProfile types
that were removed in PR llm-d#553 as part of the model-based scaling cleanup.

Changes:
- Remove Allocation and LoadProfile type definitions from CRD reference
- Remove currentAlloc field from VariantAutoscalingStatus documentation
- Update status examples to use desiredOptimizedAlloc instead
- Update VariantAutoscalingStatus description to reflect current state

These types were part of the model-based scaling implementation and are
no longer needed for saturation-based scaling.
mamy-CS pushed a commit to mamy-CS/inferno-autoscaler that referenced this pull request Feb 10, 2026
Remove documentation for CurrentAlloc, Allocation, and LoadProfile types
that were removed in PR llm-d#553 as part of the model-based scaling cleanup.

Changes:
- Remove Allocation and LoadProfile type definitions from CRD reference
- Remove currentAlloc field from VariantAutoscalingStatus documentation
- Update status examples to use desiredOptimizedAlloc instead
- Update VariantAutoscalingStatus description to reflect current state

These types were part of the model-based scaling implementation and are
no longer needed for saturation-based scaling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants