Description
What would you like to be added?
A way to drain nodes by adding more pods elsewhere to meet PodDisruptionBudgets.
Why is this needed?
Currently, when there is a Deployment, it can be configured to have a maxSurge
to avoid going under the amount of replicas the deployment requires while allowing for a new release to be rolled out. This parameter allows adding extra pods before subtracting the old ones so that the "replicas" number required is always met as a minimum,
This feature (to my knowledge) is only available when releasing new versions of an application, however when draining nodes this would be extremely useful.
Usual cluster maintenance is done by adding new nodes before removing old ones. This means all the pods in the node need to be evicted and there is usually space for one more of each of the old node in the new node. Current solutions such as the PodDisruptionBudget or Eviction API are trying to make sure that substracting pods from the current amount don´t break anything, however the possibility of temporarily having one extra pod of each deployment is not contemplated at the moment.
This request is asking for the ability to use a surplus of pods to meet all constraints for safe eviction.
Some side notes to stress the importance. Although when operating evictions on large workloads lack of PDBs or PDBs with minAvailable/maxUnavailable settings work fine. When moving deployments with 1 replicas or HPA controlled deployments that are currently scaled down enough the problem is aggravated and can only be solved through a few inefficient means, which is acerbated if node maintenance is done automatically (such as GKE, and other cloud services)
Just in case, this is a limitation that should only be counted against Deployments with strategy.type=RollingUpdate
.
Ways to deal with this situation currently:
- Have a
minReplicas
/replicas
to>1
, and a PDB withmaxUnavailable=1
when it's known that the autoscaler if in use, it's usually scaled on the lower end. Pros: There is no downtime, Cons: Waste of resources. - Do nothing, and deal with eventual downtimes. Pros: No waste of resources, Cons: There is downtime in the deployment
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Needs Triage