Scheduler causes uneven CPU overcommit on heterogeneous Proxmox clusters

## Problem

On Proxmox clusters where nodes have different CPU core counts, the current round-robin scheduler distributes VMs evenly by count, ignoring CPU capacity. This causes severe CPU overcommit on nodes with fewer cores while nodes with more cores remain underutilized.

The scheduler only considers memory (`schedulerHints.memoryAdjustment`). CPU is never evaluated.

### Real-world example

A 20-node production Proxmox cluster hosting multiple CAPI-managed Kubernetes clusters (229 VMs total across different clusters) with 3 node classes:

| Class | Nodes | Cores | RAM |
|---|---|---|---|
| A | 10 | 64 | 504 GB |
| B | 2 | 32 | 661 GB |
| C | 8 | 32 | 504 GB |

Current state:

| Node class | Avg CPU ratio | Max CPU ratio |
|---|---|---|
| A (64 cores) | 2.67:1 | 3.25:1 |
| B (32 cores, 661 GB RAM) | 2.28:1 | 2.81:1 |
| C (32 cores, 504 GB RAM) | 1.41:1 | 1.97:1 |

After a rollout restart (all 229 VMs re-scheduled from zero):

| Metric | Result |
|---|---|
| CPU ratio max | **4.44:1** |
| CPU ratio min | 1.38:1 |
| Spread (max-min) | **3.06** |
| Nodes above 3:1 | **9 / 20** |

Class B nodes (32 cores, highest RAM) are hit the hardest: the scheduler prefers them because they have the most available memory, but they have half the CPU cores. The imbalance persists until manual rebalancing via Proxmox live migration.

### Root cause

In `vmscheduler.go`, `selectNode`:
1. Sorts nodes by available memory (descending)
2. Counts existing VMs per node
3. Picks the node with the fewest VMs that has enough memory

CPU capacity (`node.CPUInfo.CPUs`) and CPU allocation per VM (`vm.CPUs`) are available from the Proxmox API but never queried by the scheduler.

### Suggestion

Introduce a `schedulerHints.cpuAdjustment` field (analogous to `memoryAdjustment`) that, when set, enables a scheduling mode that considers both CPU and memory saturation when placing VMs. When disabled (default `0`), the current round-robin behavior is preserved.

We'd be happy to contribute a PR for this. Before starting, we'd like to know if this kind of scheduler improvement aligns with the project's roadmap and if there are any design preferences we should follow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler causes uneven CPU overcommit on heterogeneous Proxmox clusters #724

Problem

Real-world example

Root cause

Suggestion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Node class	Avg CPU ratio	Max CPU ratio
A (64 cores)	2.67:1	3.25:1
B (32 cores, 661 GB RAM)	2.28:1	2.81:1
C (32 cores, 504 GB RAM)	1.41:1	1.97:1

Metric	Result
CPU ratio max	4.44:1
CPU ratio min	1.38:1
Spread (max-min)	3.06
Nodes above 3:1	9 / 20

Scheduler causes uneven CPU overcommit on heterogeneous Proxmox clusters #724

Description

Problem

Real-world example

Root cause

Suggestion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions