Skip to content

Add easing to balancer #6119

Open
Open
@abursavich

Description

@abursavich

Which component are you using?:

Balancer

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

When the balancer distributes replicas, it doesn't ease into the new distribution. It assigns replicas immediately.

This applies to proportional distribution, but is more obvious with priority distribution so I'll use that as an example.

Lets say you have cluster autoscaler and a workload with 100 replicas that runs on two different deployments each using different node types, prioritizing A over B. Assume there are no capacity issues so A has all 100 replicas and B has 0 replicas. For some reason you decide you want to switch your priorities to B over A. When the balancer notices this change, it will set the replicas of A to 0 and B to 100 without any controlled transition. All of the A replicas with be deleted without waiting for any B replicas to become available, which may require waiting for the cluster autoscaler to kick in and acquire B nodes. If a FallbackPolicy is configured, it might kick in before the B replicas are available and assign A replicas, but the previous A nodes may already be gone by then and you'll still have had an outage in the meantime.

Describe the solution you'd like.:

I haven't fully thought this through, but I think a mechanism similar to a rolling deployment update with a maxSurge and maxUnavailable would be appropriate, with the caveat that the targets may have their own things going on that effect their available replicas beyond the balancer's control (e.g. deployment rollout).

Limiting the scale down is an easier problem than limiting scale up. You might want to have at least one pod pending in each target that's under its desired available replicas as a probe (assuming the problem is something like nodes out of quota/stock). But maybe there's some problem with that specific pod/node and if you tried to schedule more, the others would come up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/balancerkind/featureCategorizes issue or PR as related to a new feature.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions