[design] hpa overshoot demonstration by lionelvillard · Pull Request #1019 · llm-d/llm-d-workload-variant-autoscaler

lionelvillard · 2026-04-16T13:55:04Z

No description provided.

Signed-off-by: Lionel Villard <villard@us.ibm.com>

Copilot

Pull request overview

Adds a design note demonstrating how Kubernetes HPA scaling on external Value metrics can massively overshoot when pod startup time is long and pending pods are counted in currentReplicas.

Changes:

Introduces a worked example showing multiplicative (factorial) scale-up feedback with external Value metrics.
Adds a formal recurrence / growth analysis and an FAQ discussing common mitigations (stabilization windows, rate limits, target tuning).

Copilot · 2026-04-16T13:58:37Z

+The `Value` formula at HPA cycle `n` (time `t = nP`) is:
+
+```
+desired(n) = ⌈ desired(n−1) × g · nP / T ⌉
+```
+
+Each cycle multiplies the previous desired count by a growing factor `g·nP/T`. Unrolling the recurrence:
+
+```
+desired(n) ≈ ∏_{k=1}^{n} (g·kP / T) = (gP/T)^n × n!
+```


The recurrence in the formal analysis uses desired(n−1) as the next cycle’s currentReplicas, which assumes the scale target is fully applied and reflected in scale.status.replicas by the next HPA sync (and that maxReplicas / behavior.scaleUp rate limits / cluster quota aren’t capping growth). Consider stating these assumptions explicitly (or framing the math as an upper-bound / worst-case) so readers don’t interpret the factorial growth as unconditional in all real clusters.

Copilot · 2026-04-16T13:58:38Z

+| t=105 | Batch 3 ready (988 pods from t=45)      | 1 053  | 10 530 req/s  | queue empty, 1 043 pods idle   |
+| t=120 | Batch 4 ready (21 692 pods from t=60)   | 22 745 | 227 450 req/s | queue empty, 22 735 pods idle  |
+
+Unlike the `AverageValue` case, the first batch (5 pods) is too small to match demand — capacity at t=75 (60 req/s) is still below the incoming rate (100 req/s), so the queue continues to grow. Only at t=90, when batch 2 arrives, does capacity finally exceed demand and the queue begins to drain. Meanwhile, 22 680 more pods are still spinning up with nothing to do. The system ends with **22 745 running pods serving 100 req/s** — 22 735 of them idle — until HPA's cooldown window allows a scale-down, which introduces yet another delay.


“cooldown window” isn’t an HPA term and could be read as a separate mechanism. Consider referencing the concrete knobs that delay scale-down (e.g., behavior.scaleDown.stabilizationWindowSeconds and/or scale-down policies) so the operational implication is unambiguous.

Suggested change

Unlike the `AverageValue` case, the first batch (5 pods) is too small to match demand — capacity at t=75 (60 req/s) is still below the incoming rate (100 req/s), so the queue continues to grow. Only at t=90, when batch 2 arrives, does capacity finally exceed demand and the queue begins to drain. Meanwhile, 22 680 more pods are still spinning up with nothing to do. The system ends with **22 745 running pods serving 100 req/s** — 22 735 of them idle — until HPA's cooldown window allows a scale-down, which introduces yet another delay.

Unlike the `AverageValue` case, the first batch (5 pods) is too small to match demand — capacity at t=75 (60 req/s) is still below the incoming rate (100 req/s), so the queue continues to grow. Only at t=90, when batch 2 arrives, does capacity finally exceed demand and the queue begins to drain. Meanwhile, 22 680 more pods are still spinning up with nothing to do. The system ends with **22 745 running pods serving 100 req/s** — 22 735 of them idle — until HPA scale-down behavior permits a reduction in replicas (for example, after `behavior.scaleDown.stabilizationWindowSeconds` and any scale-down policies), which introduces yet another delay.

hpa overshoot

a5fc5bf

Signed-off-by: Lionel Villard <villard@us.ibm.com>

Copilot AI review requested due to automatic review settings April 16, 2026 13:55

Copilot started reviewing on behalf of lionelvillard April 16, 2026 13:55 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[design] hpa overshoot demonstration#1019

[design] hpa overshoot demonstration#1019
lionelvillard wants to merge 1 commit intollm-d:mainfrom
lionelvillard:hpa-overshoot

lionelvillard commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lionelvillard commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants