| title | AKS Configurable Scheduler Profiles (preview) | |||||
|---|---|---|---|---|---|---|
| description | Improve GPU and CPU utilization, align pod placement to critical workloads, and reduce node costs at scale with Configurable Scheduler Profiles on AKS. | |||||
| date | 2026-04-06 | |||||
| authors |
|
|||||
| tags |
|
Your clusters are likely running well below capacity and underutilized resources materially contribute to increased infrastructure cost. In 2025, Datadog found most Kubernetes containers use less than 25% of their requested CPU, and in 2023, Weights and Biases found that nearly a third of GPU users averaged less than 15% utilization. While there are many factors that impact node utilization, as a core component of the Kubernetes control plane, the kube-scheduler plays a critical role in node utilization.
Configurable Scheduler Profiles on Azure Kubernetes Service (AKS) let you configure your own scheduling logic: enable specific plugins, adjust plugin priorities, and tune parameter weights. The result: higher node density, better GPU utilization, and lower infrastructure costs.
You'll learn how the default Kubernetes scheduler places pods, where the defaults fall short, and how to increase node utilization with Configurable Scheduler Profiles on AKS.
- How does kube-scheduler work?
- Use Configurable Scheduler Profiles to increase node utilization and operator control
- Increase AKS cluster CPU utilization up to 85% with Configurable Scheduler
- Increase AKS cluster GPU or CPU utilization while balancing memory with Configurable Scheduler
- FAQ: How do Configurable Scheduler Profiles interact with autoscalers?
The Kubernetes scheduler operates in two cycles: a synchronous scheduling cycle and an asynchronous binding cycle. The scheduling cycle has two sub-phases, filtering and scoring, and only manages one pod at a time.
- Filtering phase removes unsuitable nodes based on hard and soft constraints.
- Scoring phase calculates a score for the remaining nodes; ultimately, the most suitable node has the highest score.
Once a node is selected, the binding cycle can process multiple pods in parallel. During this phase, the scheduler attempts to bind the pod to the chosen node. If binding a pod to a node fails, the scheduler tries the node with the next highest score. When filtering and scoring nodes, the default scheduler considers several hard and soft constraints with predefined weights, including (but not limited to):
- Resource requirements (CPU, memory)
- Node affinity/anti-affinity
- Pod affinity/anti-affinity
- Taints and tolerations
- TopologySpreadConstraints
For more on how the Kubernetes scheduler works, see the technical blog post from SIG Scheduling contributor and AKS upstream engineer Heba Elayoty, Deep Dive into the Kubernetes Scheduler Framework.
The default scheduler is primarily designed for general-purpose workloads. It prioritizes nodes with the most available resources using the LeastAllocated scoring strategy. This strategy spreads pods across nodes, even when they could safely be packed more densely. While this works well for many services, the default scheduling criteria, and their fixed priority order, are not suitable for workloads that demand optimizing GPU and CPU utilization. In these scenarios, spreading pods across nodes can lead to fragmented resources, underutilized GPUs, and increased infrastructure cost.
Today, the default scheduler on AKS lacks the flexibility for users to change which criteria should be prioritized, or ignored, in the scheduling cycle because the default scheduler is not accessible to users. This rigidity often forces users to either accept suboptimal placement or manage a separate custom scheduler, both of which increase operational complexity. Starting with Kubernetes v1.33, AKS introduces Configurable Scheduler Profiles - an AKS-managed CRD - that exposes the upstream scheduling framework without maintaining a separate scheduler. Now, users can adjust the NodeResourcesFit plugin from the default configuration to favor nodes with higher utilization to achieve more efficient bin‑packing and reduce infrastructure cost.
Configurable Scheduler Profiles on AKS allow users to change the default scheduler behavior using the extensibility of the scheduling framework, without the operational overhead of adopting a second scheduler or defining a custom scheduler. You express the intent declaratively, while AKS safely applies and manages the resulting scheduler behavior using a Custom Resource Definition (CRD).
- AKS begins with the default scheduling configuration.
- You declare your desired scheduling behavior using a Configurable Scheduler Profile applied as a CRD.
- The controller overlays the user-defined configuration on top of the base scheduler configuration.
- The controller produces and maintains a reconciled scheduler ConfigMap representing the combined configuration.
- The kube-scheduler deployment consumes the controller-managed ConfigMap and applies the updated scheduling behavior.
- If the scheduler becomes unhealthy, the controller automatically rolls back to the last known good configuration.
A profile is a set of one or more in-tree scheduling plugins and configurations that dictate how to schedule a pod. AKS supports 18 in-tree Kubernetes scheduling plugins.
In this simple scale-out scenario, manually increasing identical CPU-bound replicas from 8 to 30, the default scheduler distributes pods evenly across nodes.
Configurable Scheduler Profiles configured for bin-packing show a visible consolidation pattern that differs from the default scheduler, and improves capacity for new pods. This shift occurs without changing the workload, node size, or autoscaling behavior - only the scheduler’s scoring logic.
While this experiment uses intentionally simple, CPU-bound containers to isolate scheduling behavior, the placement patterns observed here can be applied to GPU-bound workloads where capacity constraints elevate the need for utilization improvements. In this constrained GPU scenario the bin‑packing scheduler plays a critical role in maximizing usable capacity of 4 GPUs spread across 2 nodes. By consolidating single‑GPU workloads onto the same node, the scheduler avoids idle GPUs and enables all four GPUs to be actively used before new nodes are required.
This change in distribution shape enables downstream efficiencies: improved control for platform engineers, efficient resource usage, and cost optimization that are difficult to achieve when pods are evenly spread. Configurable Scheduler has many use cases and each profile expresses a distinct scheduling intent. The next two sections detail how the scoring strategies, MostAllocated and RequestedToCapacityRatio, are configured to achieve increased utilization outcomes.
| Scheduler | Scoring strategy | Scheduling intent | Operator benefits |
|---|---|---|---|
| Default scheduler | NodeResourcesFit: LeastAllocated | Balance and hotspot reduction | No tuning |
| Configurable Scheduler Profile | NodeResourcesFit: MostAllocated | Maximize consolidation / bin‑packing | Maximum node utilization, highest cost reduction potential |
| Configurable Scheduler Profile | NodeResourcesFit: RequestedToCapacityRatio | Targeted utilization with headroom | ✅ Recommended strategy Increased utilization with stronger control over consolidation and burst headroom than MostAllocated |
Using RequestedToCapacityRatio, this bin packing profile is configured to favor nodes within a utilization band of 50-85%, avoiding empty nodes, and severely deprioritizing nearly full nodes at 90% utilization or more, to limit oversaturated nodes. Configure node bin-packing using the RequestedToCapacityRatio strategy to improve utilization and reduce infrastructure costs.
RequestedToCapacityRatioscores nodes based on the ratio of requested resources to total node capacity after the pod is hypothetically placed. This strategy enables more fine-grained bin‑packing by allowing operators to define an ideal utilization curve for their nodes rather than simply preferring the most or least utilized nodes.PodTopologySpreadis disabled in this profile because bin-packing and zone-spreading are opposing goals and the scheduling logic may prioritize pod spreading. If you need both high utilization and zone resilience, define a new profile to achieve both goals.
By shaping the scoring curve to target a range of 50-85% CPU utilization, users can increase pod density on provisioned nodes while preserving headroom for bursts, background processes, and system components in CPU-bound workloads.
Given this level of configuration detail, RequestedToCapacityRatio is the recommended scoring strategy for node bin‑packing on AKS for production clusters.
:::note Scoring strategy can also be used for GPU by changing the target resource. Adjust resources, resource weights, utilization thresholds, and plugin parameters to match your VM SKUs, workload patterns, and cluster topology. :::
apiVersion: aks.azure.com/v1alpha1
kind: SchedulerConfiguration
metadata:
name: upstream
spec:
rawConfig: |
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: cpu-binpacking-scheduler
plugins:
multiPoint:
enabled:
- name: NodeResourcesFit
disabled:
- name: PodTopologySpread
pluginConfig:
- name: NodeResourcesFit
args:
apiVersion: kubescheduler.config.k8s.io/v1
kind: NodeResourcesFitArgs
scoringStrategy:
type: RequestedToCapacityRatio
resources:
- name: cpu
weight: 8
- name: memory
weight: 1
requestedToCapacityRatio:
shape:
- utilization: 0
score: 0
- utilization: 30
score: 8
- utilization: 50
score: 10
- utilization: 85
score: 10
- utilization: 90
score: 3
- utilization: 100
score: 0When MostAllocated and NodeResourcesBalancedAllocation are combined, the scheduler favors GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This approach reduces fragmented GPU capacity and results in fewer underutilized secondary resources. Configure node bin-packing using the MostAllocated strategy to improve utilization and reduce infrastructure costs.
MostAllocatedscores nodes based on its current resource utilization, favoring nodes that are already heavily used for the specified resources.RequestedToCapacityRatio, lets you define a scoring curve so you can explicitly control preferred utilization ranges and scores nodes based on resource requests relative to the remaining node capacity. This makesMostAllocatedmore aggressive for consolidation but gives you less explicit control over headroom.PodTopologySpreadis disabled in this profile because bin-packing and zone-spreading are opposing goals. Enabling both can result in spreading over consolidation, weakening the intended packing behavior.
NodeResourcesBalancedAllocation complements MostAllocated because it prefers nodes whose CPU and memory utilization stay proportionally balanced, helping reduce bottlenecks caused by asymmetric resource pressure.
This scheduler configuration maximizes GPU utilization by consolidating smaller jobs onto fewer nodes, reducing idle accelerator capacity while maintaining reasonable CPU and memory balance.
:::note Adjust resources, resource weights, utilization thresholds, and plugin parameters to match your VM SKUs, workload patterns, and cluster topology. :::
apiVersion: aks.azure.com/v1alpha1
kind: SchedulerConfiguration
metadata:
name: upstream
spec:
rawConfig: |
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: gpu-node-binpacking-scheduler
plugins:
multiPoint:
enabled:
- name: NodeResourcesFit
- name: NodeResourcesBalancedAllocation
disabled:
- name: PodTopologySpread
pluginConfig:
- name: NodeResourcesFit
args:
apiVersion: kubescheduler.config.k8s.io/v1
kind: NodeResourcesFitArgs
scoringStrategy:
type: MostAllocated
resources:
- name: nvidia.com/gpu
weight: 8
- name: cpu
weight: 1
- name: NodeResourcesBalancedAllocation
args:
apiVersion: kubescheduler.config.k8s.io/v1
kind: NodeResourcesBalancedAllocationArgs
resources:
- name: cpu
weight: 1
- name: memory
weight: 1-
Which bin-packing strategy does AKS recommend to increase node utilization?
AKS recommends using the scoring strategy
RequestedToCapacityRatiobecause it provides a more granular scoring approach allowing users to define an ideal utilization curve for their respective nodes. For example, this bin packing strategy allows users to configure a target utilization of 85%. -
How do Configurable Scheduler Profiles interact with autoscalers such as Node Auto Provisioning (NAP), Cluster Autoscaler (CA), and Vertical Pod Autoscaler (VPA)?
These components are complementary to each other. Configurable Scheduler Profiles influence how pods are placed on nodes, while autoscalers make scaling decisions based on resource utilization and pending pods. Look out for an upcoming blog that details how scheduling constraints affect Node Auto Provisioning.
- Node Auto Provisioning (NAP) is triggered when pods are unschedulable. If a suitable node already exists, that pod will be scheduled with the defined Configurable Scheduler Profile. If no suitable node exists, NAP provisions new capacity and schedules the pod.
- Cluster Autoscaler (CA) manages node scale-up and scale-down. On scale-up, CA is triggered when there aren't any suitable nodes available for the pending pod. Using Configurable Scheduler Profiles ensures nodes are only scaled when provisioned resources are no longer suitable. On scale-down, CA is triggered when nodes fall below utilization thresholds, the default is 50%. As active nodes are packed more efficiently, underutilized nodes become easier candidates for removal.
- Vertical Pod Autoscaler (VPA) optimizes resource utilization patterns in pods. As pods are recreated with updated CPU and memory requests, they are scheduled using the configured scheduler profile, allowing placement decisions to reflect the new resource requirements.
-
What if a resource, like
memory, is omitted in thescoringStrategy?If a resource is omitted in the
scoringStrategy, then that resource will not be considered in the filter or scoring cycles of the defined Configurable Scheduler Profile. If that resource should be considered, but with a reduced influence on the final score, it can be included with reduced weight. -
Can multiple Configurable Scheduler Profiles be used for different workloads on the same cluster?
Yes, multiple scheduling profiles can coexist in a single cluster. This allows different placement strategies (for example, cost‑optimized vs. latency‑sensitive workloads) to run side‑by‑side. Visit the documentation for a multiple scheduler profiles example.
- Multiple profiles can be defined centrally in a single scheduler configuration.
- Individual workloads select a profile via
schedulerNamein the pod spec.
-
How do I monitor whether my scheduler profile is improving utilization?
Monitor these signals to confirm that the scheduler is behaving correctly. Over time, you should see higher average node utilization, reduced variance between nodes, and fewer lightly utilized nodes.
- Track node‑level utilization metrics, including CPU and memory utilization per node and distribution of pods across nodes, using Azure Monitor Container Insights, the AKS node viewer tool, or
kubectl top nodesfor quick validation. - Review autoscaler outcomes, looking for fewer scale‑ups during normal load and more decisive scale‑downs after demand drops.
- Measure cost metrics, such as reduced idle costs when you use the Cost Analysis add-on.
- Track node‑level utilization metrics, including CPU and memory utilization per node and distribution of pods across nodes, using Azure Monitor Container Insights, the AKS node viewer tool, or
Configurable Scheduler Profiles give you direct control over pod placement. With these scheduling plugins, your workloads make full use of available GPU capacity, reduce idle costs, and avoid costly overprovisioning.
- For additional guidance and best practices, see kube-scheduler best practices
- Increase node utilization using Configurable Scheduler Profiles
- To schedule and queue batch workloads, install and configure Kueue on AKS.




