[design] hpa overshoot demonstration#1019
Conversation
Signed-off-by: Lionel Villard <villard@us.ibm.com>
There was a problem hiding this comment.
Pull request overview
Adds a design note demonstrating how Kubernetes HPA scaling on external Value metrics can massively overshoot when pod startup time is long and pending pods are counted in currentReplicas.
Changes:
- Introduces a worked example showing multiplicative (factorial) scale-up feedback with external
Valuemetrics. - Adds a formal recurrence / growth analysis and an FAQ discussing common mitigations (stabilization windows, rate limits, target tuning).
| The `Value` formula at HPA cycle `n` (time `t = nP`) is: | ||
|
|
||
| ``` | ||
| desired(n) = ⌈ desired(n−1) × g · nP / T ⌉ | ||
| ``` | ||
|
|
||
| Each cycle multiplies the previous desired count by a growing factor `g·nP/T`. Unrolling the recurrence: | ||
|
|
||
| ``` | ||
| desired(n) ≈ ∏_{k=1}^{n} (g·kP / T) = (gP/T)^n × n! | ||
| ``` |
There was a problem hiding this comment.
The recurrence in the formal analysis uses desired(n−1) as the next cycle’s currentReplicas, which assumes the scale target is fully applied and reflected in scale.status.replicas by the next HPA sync (and that maxReplicas / behavior.scaleUp rate limits / cluster quota aren’t capping growth). Consider stating these assumptions explicitly (or framing the math as an upper-bound / worst-case) so readers don’t interpret the factorial growth as unconditional in all real clusters.
| | t=105 | Batch 3 ready (988 pods from t=45) | 1 053 | 10 530 req/s | queue empty, 1 043 pods idle | | ||
| | t=120 | Batch 4 ready (21 692 pods from t=60) | 22 745 | 227 450 req/s | queue empty, 22 735 pods idle | | ||
|
|
||
| Unlike the `AverageValue` case, the first batch (5 pods) is too small to match demand — capacity at t=75 (60 req/s) is still below the incoming rate (100 req/s), so the queue continues to grow. Only at t=90, when batch 2 arrives, does capacity finally exceed demand and the queue begins to drain. Meanwhile, 22 680 more pods are still spinning up with nothing to do. The system ends with **22 745 running pods serving 100 req/s** — 22 735 of them idle — until HPA's cooldown window allows a scale-down, which introduces yet another delay. |
There was a problem hiding this comment.
“cooldown window” isn’t an HPA term and could be read as a separate mechanism. Consider referencing the concrete knobs that delay scale-down (e.g., behavior.scaleDown.stabilizationWindowSeconds and/or scale-down policies) so the operational implication is unambiguous.
| Unlike the `AverageValue` case, the first batch (5 pods) is too small to match demand — capacity at t=75 (60 req/s) is still below the incoming rate (100 req/s), so the queue continues to grow. Only at t=90, when batch 2 arrives, does capacity finally exceed demand and the queue begins to drain. Meanwhile, 22 680 more pods are still spinning up with nothing to do. The system ends with **22 745 running pods serving 100 req/s** — 22 735 of them idle — until HPA's cooldown window allows a scale-down, which introduces yet another delay. | |
| Unlike the `AverageValue` case, the first batch (5 pods) is too small to match demand — capacity at t=75 (60 req/s) is still below the incoming rate (100 req/s), so the queue continues to grow. Only at t=90, when batch 2 arrives, does capacity finally exceed demand and the queue begins to drain. Meanwhile, 22 680 more pods are still spinning up with nothing to do. The system ends with **22 745 running pods serving 100 req/s** — 22 735 of them idle — until HPA scale-down behavior permits a reduction in replicas (for example, after `behavior.scaleDown.stabilizationWindowSeconds` and any scale-down policies), which introduces yet another delay. |
No description provided.