maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods

### What would you like to be added?

A way to drain nodes by adding more pods elsewhere to meet PodDisruptionBudgets.

### Why is this needed?

Currently, when there is a Deployment, it can be configured to have a `maxSurge` to avoid going under the amount of replicas the deployment requires while allowing for a new release to be rolled out. This parameter allows adding extra pods before subtracting the old ones so that the "replicas" number required is always met as a minimum,

This feature (to my knowledge) is only available when releasing new versions of an application, however when draining nodes this would be extremely useful.

Usual cluster maintenance is done by adding new nodes before removing old ones. This means all the pods in the node need to be evicted and there is usually space for one more of each of the old node in the new node. Current solutions such as the PodDisruptionBudget or Eviction API are trying to make sure that substracting pods from the current amount don´t break anything, however the possibility of temporarily having one extra pod of each deployment is not contemplated at the moment.

This request is asking for the ability to use a surplus of pods to meet all constraints for safe eviction.

Some side notes to stress the importance. Although when operating evictions on large workloads lack of PDBs or PDBs with minAvailable/maxUnavailable settings work fine. When moving deployments with 1 replicas or HPA controlled deployments that are currently scaled down enough the problem is aggravated and can only be solved through a few inefficient means, which is acerbated if node maintenance is done automatically (such as GKE, and other cloud services)

Just in case, this is a limitation that should only be counted against Deployments with `strategy.type=RollingUpdate`.

Ways to deal with this situation currently:

1. Have a `minReplicas`/`replicas` to `>1`, and a PDB with `maxUnavailable=1` when it's known that the autoscaler if in use, it's usually scaled on the lower end. Pros: There is no downtime, Cons: Waste of resources.
2. Do nothing, and deal with eventual downtimes. Pros: No waste of resources, Cons: There is downtime in the deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods #114877

What would you like to be added?

Why is this needed?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods #114877

Description

What would you like to be added?

Why is this needed?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions