You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: keps/sig-scheduling/3990-pod-topology-spread-fallback-mode/README.md
+20
Original file line number
Diff line number
Diff line change
@@ -378,6 +378,26 @@ which creates new Node for Pod typically by the cluster autoscaler.
378
378
3. The cluster autoscaler adds `TriggeredScaleUp: false`.
379
379
4. The scheduler notices `TriggeredScaleUp: false` on Pod and schedules that Pod while falling back to `ScheduleAnyway` on Pod Topology Spread.
380
380
381
+
#### How we implement `TriggeredScaleUp` in the cluster autoscaler
382
+
383
+
Basically, we just put `TriggeredScaleUp: false` for Pods in [status.ScaleUpStatus.PodsRemainUnschedulable](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/processors/status/scale_up_status_processor.go#L37) every [reconciliation (RunOnce)](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/core/static_autoscaler.go#L296).
384
+
385
+
This `status.ScaleUpStatus.PodsRemainUnschedulable` contains Pods that the cluster autoscaler [simulates](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go#L536) the scheduling process for and determines that Pods wouldn't be schedulable in any node group.
386
+
387
+
So, for a simple example,
388
+
if a Pod has 64 cpu request, but no node group can satisfy 64 cpu requirement,
389
+
the Pod would be in `status.ScaleUpStatus.PodsRemainUnschedulable`; get `TriggeredScaleUp: false`.
390
+
391
+
A complecated scenario could also be covered by this way;
392
+
supposing a Pod has 64 cpu request and only a node group can satisfy 64 cpu requirement,
393
+
but the node group is running out of instances at the moment.
394
+
In this case, the first reconciliation selects the node group to make the Pod schedulable,
395
+
but the node group size increase request would be rejected by the cloud provider because of the stockout.
396
+
The node group is then considered to be non-safe for a while,
397
+
and the next reconciliation happens without taking the failed node group into account.
398
+
As said, there's no other node group that can satisfy 64 cpu requirement,
399
+
and then the Pod would be finally in `status.ScaleUpStatus.PodsRemainUnschedulable`; get `TriggeredScaleUp: false`.
400
+
381
401
### PreemptionFalied
382
402
383
403
`PreemptionFailed` is used to fallback when preemption is failed.
0 commit comments