CA DRA: process unschedulable Pods through ClusterSnapshot

**Which component are you using?**:

/area cluster-autoscaler
/area core-autoscaler
/wg device-management

**Is your feature request designed to solve a problem? If so describe the problem this feature should solve.**:

`ClusterSnapshot` was extended significantly for the DRA autoscaling MVP. In addition to tracking just Nodes and scheduled Pods, it now tracks the state of all DRA objects in the cluster. Some of these DRA objects are owned by unschedulable Pods. At the same time, the unschedulable Pods themselves are still tracked and processed outside ClusterSnapshot.

So we basically have the state for unschedulable Pods in two places:

* The unschedulable Pods themselves are just a slice variable in `StaticAutoscaler.RunOnce()` that gets processed by `PodListProcessor` and then passed to `ScaleUp`.
* The ResourceClaims owned by the unschedulable Pods are tracked and modified in `dynamicresources.Snapshot` inside `ClusterSnapshot`.

As pointed out by @MaciekPytel during the MVP review, this leaves us with a risk of the two data sources diverging quite easily. For example, a `PodListProcessor` could inject a "fake" unschedulable Pod to the list, but not inject the Pod's ResourceClaims to the `ClusterSnapshot`.

**Describe the solution you'd like.**:

* Move unschedulable pods inside `ClusterSnapshot`.
  * Make `ClusterSnapshot.SetClusterState()` take all Pods in the cluster and divide them into scheduled and unschedulable internally.
  * We can probably implement the tracking (including correctly handling `Fork()/Commit()/Revert()`) pretty easily by putting the unschedulable Pods on a special meta-NodeInfo in the existing `ClusterSnapshotStore` implementations.
  * Pods move between the scheduled and unschedulable state during `SchedulePod()/UnschedulePod()` calls.
  * Add methods for obtaining and processing unschedulable Pods to `ClusterSnapshot`. We need at least `ListUnschedulablePods()`, `AddUnschedulablePod(pod *framework.PodInfo)`, `RemoveUnschedulablePod(name, namespace string)`.
  * We also need a way to mark some unschedulable Pods to be ignored during scale-up, but not actually remove them from the ClusterSnapshot. This is because their ResourceClaims could technically be partially allocated (so the Pod can't schedule yet, but it does reserve some Devices already), and then removing them from ClusterSnapshot would mean simulating some allocated Devices as free. This could be implemented via `ClusterSnapshot.IgnoreUnschedulablePod(name, namespace string)`, but it might also fit better in the ScaleUp code itself.
* Refactor `ScaleUp` to take the unschedulable Pods from `ClusterSnapshot.ListUnschedulablePods()` instead of a method parameter.
* Refactor `PodListProcessor` and its implementations to modify the unschedulable Pods via the new `ClusterSnapshot` methods instead of modifying and returning a method parameter.


**Additional context.**:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. An MVP of the support was implemented in #7530 (with the whole implementation tracked in https://github.com/kubernetes/kubernetes/issues/118612). There are a number of post-MVP follow-ups to be addressed before DRA autoscaling is ready for production use - this is one of them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CA DRA: process unschedulable Pods through ClusterSnapshot #7686

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CA DRA: process unschedulable Pods through ClusterSnapshot #7686

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions