Configurable Machine drain behavior

Today, when Cluster API deletes a Machine it drains the corresponding Node to ensure all Pods running on the Node have
been gracefully terminated before deleting the corresponding infrastructure. The current drain implementation has
hard-coded rules to decide which Pods should be evicted. This implementation is aligned to kubectl drain (see
[Machine deletion process](https://main.cluster-api.sigs.k8s.io/tasks/automated-machine-management/machine_deletions)
for more details).

With recent changes in Cluster API, we can now have finer control on the drain process, and thus we propose a new
MachineDrainRule CRD to make the drain rules configurable per Pod. Additionally, we're proposing annotations that
workload cluster admins can add to individual Pods to control their drain behavior.

This would be a huge improvement over the “standard” kubectl drain aligned implementation we have today and help to
solve a family of issues identified when running Cluster API in production.

More details can be found in the proposal PR.

Prior related discussions:
* https://github.com/kubernetes-sigs/cluster-api/issues/11024
* https://github.com/kubernetes-sigs/cluster-api/issues/6158

Tasks:
* [x] Proposal: https://github.com/kubernetes-sigs/cluster-api/pull/11241
* [x] https://github.com/kubernetes-sigs/cluster-api/pull/11246
* [x] https://github.com/kubernetes-sigs/cluster-api/pull/11353
* [x] https://github.com/kubernetes-sigs/cluster-api/pull/11362
* [x] https://github.com/kubernetes-sigs/cluster-api/pull/11386
* [x] https://github.com/kubernetes-sigs/cluster-api/pull/11752

Follow-ups:
* https://github.com/kubernetes-sigs/cluster-api/pull/11545

Ideas for follow-ups:
* Consider adding some sort of timeout for individual Pods (e.g. "GracePeriodSeconds")
* Standardize skip label for the Kubernetes ecosystem, xrefs:
  * https://github.com/kubernetes-sigs/cluster-api/pull/11241#discussion_r1784088490
  * https://github.com/kubernetes/kubernetes/issues/127247



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable Machine drain behavior #11240

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configurable Machine drain behavior #11240

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions