Problem
On Proxmox clusters where nodes have different CPU core counts, the current round-robin scheduler distributes VMs evenly by count, ignoring CPU capacity. This causes severe CPU overcommit on nodes with fewer cores while nodes with more cores remain underutilized.
The scheduler only considers memory (schedulerHints.memoryAdjustment). CPU is never evaluated.
Real-world example
A 20-node production Proxmox cluster hosting multiple CAPI-managed Kubernetes clusters (229 VMs total across different clusters) with 3 node classes:
| Class |
Nodes |
Cores |
RAM |
| A |
10 |
64 |
504 GB |
| B |
2 |
32 |
661 GB |
| C |
8 |
32 |
504 GB |
Current state:
| Node class |
Avg CPU ratio |
Max CPU ratio |
| A (64 cores) |
2.67:1 |
3.25:1 |
| B (32 cores, 661 GB RAM) |
2.28:1 |
2.81:1 |
| C (32 cores, 504 GB RAM) |
1.41:1 |
1.97:1 |
After a rollout restart (all 229 VMs re-scheduled from zero):
| Metric |
Result |
| CPU ratio max |
4.44:1 |
| CPU ratio min |
1.38:1 |
| Spread (max-min) |
3.06 |
| Nodes above 3:1 |
9 / 20 |
Class B nodes (32 cores, highest RAM) are hit the hardest: the scheduler prefers them because they have the most available memory, but they have half the CPU cores. The imbalance persists until manual rebalancing via Proxmox live migration.
Root cause
In vmscheduler.go, selectNode:
- Sorts nodes by available memory (descending)
- Counts existing VMs per node
- Picks the node with the fewest VMs that has enough memory
CPU capacity (node.CPUInfo.CPUs) and CPU allocation per VM (vm.CPUs) are available from the Proxmox API but never queried by the scheduler.
Suggestion
Introduce a schedulerHints.cpuAdjustment field (analogous to memoryAdjustment) that, when set, enables a scheduling mode that considers both CPU and memory saturation when placing VMs. When disabled (default 0), the current round-robin behavior is preserved.
We'd be happy to contribute a PR for this. Before starting, we'd like to know if this kind of scheduler improvement aligns with the project's roadmap and if there are any design preferences we should follow.
Problem
On Proxmox clusters where nodes have different CPU core counts, the current round-robin scheduler distributes VMs evenly by count, ignoring CPU capacity. This causes severe CPU overcommit on nodes with fewer cores while nodes with more cores remain underutilized.
The scheduler only considers memory (
schedulerHints.memoryAdjustment). CPU is never evaluated.Real-world example
A 20-node production Proxmox cluster hosting multiple CAPI-managed Kubernetes clusters (229 VMs total across different clusters) with 3 node classes:
Current state:
After a rollout restart (all 229 VMs re-scheduled from zero):
Class B nodes (32 cores, highest RAM) are hit the hardest: the scheduler prefers them because they have the most available memory, but they have half the CPU cores. The imbalance persists until manual rebalancing via Proxmox live migration.
Root cause
In
vmscheduler.go,selectNode:CPU capacity (
node.CPUInfo.CPUs) and CPU allocation per VM (vm.CPUs) are available from the Proxmox API but never queried by the scheduler.Suggestion
Introduce a
schedulerHints.cpuAdjustmentfield (analogous tomemoryAdjustment) that, when set, enables a scheduling mode that considers both CPU and memory saturation when placing VMs. When disabled (default0), the current round-robin behavior is preserved.We'd be happy to contribute a PR for this. Before starting, we'd like to know if this kind of scheduler improvement aligns with the project's roadmap and if there are any design preferences we should follow.