Skip to content

Commit 77ba939

Browse files
committed
elaborate how to implement TriggeredScaleUp in CA
1 parent 5318bd5 commit 77ba939

File tree

1 file changed

+20
-0
lines changed
  • keps/sig-scheduling/3990-pod-topology-spread-fallback-mode

1 file changed

+20
-0
lines changed

keps/sig-scheduling/3990-pod-topology-spread-fallback-mode/README.md

+20
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,26 @@ which creates new Node for Pod typically by the cluster autoscaler.
378378
3. The cluster autoscaler adds `TriggeredScaleUp: false`.
379379
4. The scheduler notices `TriggeredScaleUp: false` on Pod and schedules that Pod while falling back to `ScheduleAnyway` on Pod Topology Spread.
380380

381+
#### How we implement `TriggeredScaleUp` in the cluster autoscaler
382+
383+
Basically, we just put `TriggeredScaleUp: false` for Pods in [status.ScaleUpStatus.PodsRemainUnschedulable](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/processors/status/scale_up_status_processor.go#L37) every [reconciliation (RunOnce)](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/core/static_autoscaler.go#L296).
384+
385+
This `status.ScaleUpStatus.PodsRemainUnschedulable` contains Pods that the cluster autoscaler [simulates](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go#L536) the scheduling process for and determines that Pods wouldn't be schedulable in any node group.
386+
387+
So, for a simple example,
388+
if a Pod has 64 cpu request, but no node group can satisfy 64 cpu requirement,
389+
the Pod would be in `status.ScaleUpStatus.PodsRemainUnschedulable`; get `TriggeredScaleUp: false`.
390+
391+
A complecated scenario could also be covered by this way;
392+
supposing a Pod has 64 cpu request and only a node group can satisfy 64 cpu requirement,
393+
but the node group is running out of instances at the moment.
394+
In this case, the first reconciliation selects the node group to make the Pod schedulable,
395+
but the node group size increase request would be rejected by the cloud provider because of the stockout.
396+
The node group is then considered to be non-safe for a while,
397+
and the next reconciliation happens without taking the failed node group into account.
398+
As said, there's no other node group that can satisfy 64 cpu requirement,
399+
and then the Pod would be finally in `status.ScaleUpStatus.PodsRemainUnschedulable`; get `TriggeredScaleUp: false`.
400+
381401
### PreemptionFalied
382402

383403
`PreemptionFailed` is used to fallback when preemption is failed.

0 commit comments

Comments
 (0)