elaborate how to implement TriggeredScaleUp in CA

sanposhiho · sanposhiho · commit 77ba939f2971 · 2024-03-17T18:58:10.000+09:00
diff --git a/keps/sig-scheduling/3990-pod-topology-spread-fallback-mode/README.md b/keps/sig-scheduling/3990-pod-topology-spread-fallback-mode/README.md
@@ -378,6 +378,26 @@ which creates new Node for Pod typically by the cluster autoscaler.
 3. The cluster autoscaler adds `TriggeredScaleUp: false`. 
 4. The scheduler notices `TriggeredScaleUp: false` on Pod and schedules that Pod while falling back to `ScheduleAnyway` on Pod Topology Spread.
 
+#### How we implement `TriggeredScaleUp` in the cluster autoscaler
+
+Basically, we just put `TriggeredScaleUp: false` for Pods in [status.ScaleUpStatus.PodsRemainUnschedulable](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/processors/status/scale_up_status_processor.go#L37) every [reconciliation (RunOnce)](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/core/static_autoscaler.go#L296).
+
+This `status.ScaleUpStatus.PodsRemainUnschedulable` contains Pods that the cluster autoscaler [simulates](https://github.com/kubernetes/autoscaler/blob/109998dbf30e6a6ef84fc37ebaccca23d7dee2f3/cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go#L536) the scheduling process for and determines that Pods wouldn't be schedulable in any node group. 
+
+So, for a simple example, 
+if a Pod has 64 cpu request, but no node group can satisfy 64 cpu requirement,
+the Pod would be in `status.ScaleUpStatus.PodsRemainUnschedulable`; get `TriggeredScaleUp: false`.
+
+A complecated scenario could also be covered by this way;
+supposing a Pod has 64 cpu request and only a node group can satisfy 64 cpu requirement,
+but the node group is running out of instances at the moment.
+In this case, the first reconciliation selects the node group to make the Pod schedulable,
+but the node group size increase request would be rejected by the cloud provider because of the stockout.
+The node group is then considered to be non-safe for a while,
+and the next reconciliation happens without taking the failed node group into account.
+As said, there's no other node group that can satisfy 64 cpu requirement,
+and then the Pod would be finally in `status.ScaleUpStatus.PodsRemainUnschedulable`; get `TriggeredScaleUp: false`.
+
 ### PreemptionFalied
 
 `PreemptionFailed` is used to fallback when preemption is failed.