Description
Today, when Cluster API deletes a Machine it drains the corresponding Node to ensure all Pods running on the Node have
been gracefully terminated before deleting the corresponding infrastructure. The current drain implementation has
hard-coded rules to decide which Pods should be evicted. This implementation is aligned to kubectl drain (see
Machine deletion process
for more details).
With recent changes in Cluster API, we can now have finer control on the drain process, and thus we propose a new
MachineDrainRule CRD to make the drain rules configurable per Pod. Additionally, we're proposing annotations that
workload cluster admins can add to individual Pods to control their drain behavior.
This would be a huge improvement over the “standard” kubectl drain aligned implementation we have today and help to
solve a family of issues identified when running Cluster API in production.
More details can be found in the proposal PR.
Prior related discussions:
- Provide a better solution than standard kubectl drain #11024
- Support for DaemonSet eviction when draining nodes #6158
Tasks:
- Proposal: 📖 Proposal: MachineDrainRules #11241
- ⚠️ Machine: ignore attached Volumes referred by pods ignored during drain #11246
- ✨ Implement MachineDrainRules #11353
- 🌱 Extend Node drain e2e test to cover MachineDrainRules #11362
- 🌱 Add feature gate to consider VolumeAttachments when waiting for volume detach #11386
- Documentation (book, can be probably mostly taken from the proposal)
Follow-ups:
- Consider adding some sort of timeout for individual Pods (e.g. "GracePeriodSeconds")
- Standardize skip label for the Kubernetes ecosystem, xrefs: