Skip to content

Commit ea3fc17

Browse files
committed
elaborate how to implement TriggeredScaleUp in CA
1 parent 5318bd5 commit ea3fc17

File tree

1 file changed

+21
-0
lines changed
  • keps/sig-scheduling/3990-pod-topology-spread-fallback-mode

1 file changed

+21
-0
lines changed

keps/sig-scheduling/3990-pod-topology-spread-fallback-mode/README.md

+21
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ tags, and then generate with `hack/update-toc.sh`.
9292
- [Design Details](#design-details)
9393
- [new API changes](#new-api-changes)
9494
- [ScaleUpFailed](#scaleupfailed)
95+
- [How we implement <code>TriggeredScaleUp</code> in the cluster autoscaler](#how-we-implement--in-the-cluster-autoscaler)
9596
- [PreemptionFalied](#preemptionfalied)
9697
- [What if are both specified in <code>FallbackCriterion</code>?](#what-if-are-both-specified-in-)
9798
- [Test Plan](#test-plan)
@@ -378,6 +379,26 @@ which creates new Node for Pod typically by the cluster autoscaler.
378379
3. The cluster autoscaler adds `TriggeredScaleUp: false`.
379380
4. The scheduler notices `TriggeredScaleUp: false` on Pod and schedules that Pod while falling back to `ScheduleAnyway` on Pod Topology Spread.
380381

382+
#### How we implement `TriggeredScaleUp` in the cluster autoscaler
383+
384+
Basically, we just put `TriggeredScaleUp: false` for Pods in [status.ScaleUpStatus.PodsRemainUnschedulable](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/processors/status/scale_up_status_processor.go#L37) every [reconciliation (RunOnce)](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/core/static_autoscaler.go#L296).
385+
386+
This `status.ScaleUpStatus.PodsRemainUnschedulable` contains Pods that the cluster autoscaler [simulates](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go#L536) the scheduling process for and determines that Pods wouldn't be schedulable in any node group.
387+
388+
So, for a simple example,
389+
if a Pod has 64 cpu request, but no node group can satisfy 64 cpu requirement,
390+
the Pod would be in `status.ScaleUpStatus.PodsRemainUnschedulable`; get `TriggeredScaleUp: false`.
391+
392+
A complicated scenario could also be covered by this way;
393+
supposing a Pod has 64 cpu request and only a node group can satisfy 64 cpu requirement,
394+
but the node group is running out of instances at the moment.
395+
In this case, the first reconciliation selects the node group to make the Pod schedulable,
396+
but the node group size increase request would be rejected by the cloud provider because of the stockout.
397+
The node group is then considered to be non-safe for a while,
398+
and the next reconciliation happens without taking the failed node group into account.
399+
As said, there's no other node group that can satisfy 64 cpu requirement,
400+
and then the Pod would be finally in `status.ScaleUpStatus.PodsRemainUnschedulable`; get `TriggeredScaleUp: false`.
401+
381402
### PreemptionFalied
382403

383404
`PreemptionFailed` is used to fallback when preemption is failed.

0 commit comments

Comments
 (0)