Skip to content

CAPI provider: scale-down blocked when current replicas exceed maxSize annotation #9395

@MaxFedotov

Description

@MaxFedotov

Which component are you using?:
/area cluster-autoscaler

What version of the component are you using?:

Component version: build from latest main

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Server Version: v1.32.3+rke2r1

What happened instead?:

When using the ClusterAPI provider, lowering the cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size annotation below the current replica count creates a deadlock where the autoscaler cannot scale down.

DeleteNodes calls SetSize(replicas - 1) for each node being removed. SetSize unconditionally rejects any value above maxSize:

case nreplicas > r.maxSize:
   return fmt.Errorf("size increase too large - desired:%d max:%d", nreplicas, r.maxSize)

For example, with 4 running replicas and max annotation changed from 4 to 2, the autoscaler attempts SetSize(3) which fails because 3 > 2. The group can never converge to the desired size.

E0320 12:03:47.710228  1 delete_in_batch.go:206] Scale-down: couldn't delete empty node "node-007", status error: failed to delete nodes from group MachineDeployment/.../cpu-worker: size increase too large - desired:3 max:2

The maxSize check in SetSize should only apply when the replica count is actually increasing, not when it's decreasing toward the valid range.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions