CAPI provider: scale-down blocked when current replicas exceed maxSize annotation

**Which component are you using?**:
/area cluster-autoscaler



**What version of the component are you using?**:



Component version: build from latest main

**What k8s version are you using (`kubectl version`)?**: 

<details><summary><code>kubectl version</code> Output</summary><br><pre>
$ kubectl version
Server Version: v1.32.3+rke2r1

</pre></details>

**What happened instead?**:



When using the ClusterAPI provider, lowering the `cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size` annotation below the current replica count creates a deadlock where the autoscaler cannot scale down.

`DeleteNodes` calls `SetSize(replicas - 1)` for each node being removed. `SetSize` unconditionally rejects any value above `maxSize`:
```
case nreplicas > r.maxSize:
   return fmt.Errorf("size increase too large - desired:%d max:%d", nreplicas, r.maxSize)
```

For example, with 4 running replicas and max annotation changed from 4 to 2, the autoscaler attempts `SetSize(3)` which fails because 3 > 2. The group can never converge to the desired size.
```
E0320 12:03:47.710228  1 delete_in_batch.go:206] Scale-down: couldn't delete empty node "node-007", status error: failed to delete nodes from group MachineDeployment/.../cpu-worker: size increase too large - desired:3 max:2
```

The `maxSize` check in `SetSize` should only apply when the replica count is actually increasing, not when it's decreasing toward the valid range.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAPI provider: scale-down blocked when current replicas exceed maxSize annotation #9395

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CAPI provider: scale-down blocked when current replicas exceed maxSize annotation #9395

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions