Skip to content

Scheduler causes uneven CPU overcommit on heterogeneous Proxmox clusters #724

@emanuelebosetti

Description

@emanuelebosetti

Problem

On Proxmox clusters where nodes have different CPU core counts, the current round-robin scheduler distributes VMs evenly by count, ignoring CPU capacity. This causes severe CPU overcommit on nodes with fewer cores while nodes with more cores remain underutilized.

The scheduler only considers memory (schedulerHints.memoryAdjustment). CPU is never evaluated.

Real-world example

A 20-node production Proxmox cluster hosting multiple CAPI-managed Kubernetes clusters (229 VMs total across different clusters) with 3 node classes:

Class Nodes Cores RAM
A 10 64 504 GB
B 2 32 661 GB
C 8 32 504 GB

Current state:

Node class Avg CPU ratio Max CPU ratio
A (64 cores) 2.67:1 3.25:1
B (32 cores, 661 GB RAM) 2.28:1 2.81:1
C (32 cores, 504 GB RAM) 1.41:1 1.97:1

After a rollout restart (all 229 VMs re-scheduled from zero):

Metric Result
CPU ratio max 4.44:1
CPU ratio min 1.38:1
Spread (max-min) 3.06
Nodes above 3:1 9 / 20

Class B nodes (32 cores, highest RAM) are hit the hardest: the scheduler prefers them because they have the most available memory, but they have half the CPU cores. The imbalance persists until manual rebalancing via Proxmox live migration.

Root cause

In vmscheduler.go, selectNode:

  1. Sorts nodes by available memory (descending)
  2. Counts existing VMs per node
  3. Picks the node with the fewest VMs that has enough memory

CPU capacity (node.CPUInfo.CPUs) and CPU allocation per VM (vm.CPUs) are available from the Proxmox API but never queried by the scheduler.

Suggestion

Introduce a schedulerHints.cpuAdjustment field (analogous to memoryAdjustment) that, when set, enables a scheduling mode that considers both CPU and memory saturation when placing VMs. When disabled (default 0), the current round-robin behavior is preserved.

We'd be happy to contribute a PR for this. Before starting, we'd like to know if this kind of scheduler improvement aligns with the project's roadmap and if there are any design preferences we should follow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions