AKS/website/blog/2026-04-06-aks-config-scheduler-profiles-preview/index.md at 1687b82852fe487ddf981199cea569ac0505216f · Azure/AKS

title

AKS Configurable Scheduler Profiles (preview)

description

Improve GPU and CPU utilization, align pod placement to critical workloads, and reduce node costs at scale with Configurable Scheduler Profiles on AKS.

date

2026-04-06

authors

colin-mixon

How does the default Kubernetes scheduler place pods?

The Kubernetes scheduler operates in two cycles: a synchronous scheduling cycle and an asynchronous binding cycle. The scheduling cycle has two sub-phases, filtering and scoring, and only manages one pod at a time.

Filtering phase removes unsuitable nodes based on hard and soft constraints.
Scoring phase calculates a score for the remaining nodes; ultimately, the most suitable node has the highest score.

Once a node is selected, the binding cycle can process multiple pods in parallel. During this phase, the scheduler attempts to bind the pod to the chosen node. If binding a pod to a node fails, the scheduler tries the node with the next highest score. When filtering and scoring nodes, the default scheduler considers several hard and soft constraints with predefined weights, including (but not limited to):

Resource requirements (CPU, memory)
Node affinity/anti-affinity
Pod affinity/anti-affinity
Taints and tolerations
TopologySpreadConstraints

For more on how the Kubernetes scheduler works, see the technical blog post from SIG Scheduling contributor and AKS upstream engineer Heba Elayoty, Deep Dive into the Kubernetes Scheduler Framework.

Limitations of the default Kubernetes scheduler

The default scheduler is primarily designed for general-purpose workloads. It prioritizes nodes with the most available resources using the LeastAllocated scoring strategy. This strategy spreads pods across nodes, even when they could safely be packed more densely. While this works well for many services, the default scheduling criteria, and their fixed priority order, are not suitable for workloads that demand optimizing GPU and CPU utilization. In these scenarios, spreading pods across nodes can lead to fragmented resources, underutilized GPUs, and increased infrastructure cost.

Today, the default scheduler on AKS lacks the flexibility for users to change which criteria should be prioritized, or ignored, in the scheduling cycle because the default scheduler is not accessible to users. This rigidity often forces users to either accept suboptimal placement or manage a separate custom scheduler, both of which increase operational complexity. Starting with Kubernetes v1.33, AKS introduces Configurable Scheduler Profiles - an AKS-managed CRD - that exposes the upstream scheduling framework without maintaining a separate scheduler. Now, users can adjust the NodeResourcesFit plugin from the default configuration to favor nodes with higher utilization to achieve more efficient bin‑packing and reduce infrastructure cost.

Configurable Scheduler Profiles on AKS

Configurable Scheduler Profiles on AKS allow users to change the default scheduler behavior using the extensibility of the scheduling framework, without the operational overhead of adopting a second scheduler or defining a custom scheduler. You express the intent declaratively, while AKS safely applies and manages the resulting scheduler behavior using a Custom Resource Definition (CRD).

AKS begins with the default scheduling configuration.
You declare your desired scheduling behavior using a Configurable Scheduler Profile applied as a CRD.
The controller overlays the user-defined configuration on top of the base scheduler configuration.
The controller produces and maintains a reconciled scheduler ConfigMap representing the combined configuration.
The kube-scheduler deployment consumes the controller-managed ConfigMap and applies the updated scheduling behavior.
If the scheduler becomes unhealthy, the controller automatically rolls back to the last known good configuration.

A profile is a set of one or more in-tree scheduling plugins and configurations that dictate how to schedule a pod. AKS supports 18 in-tree Kubernetes scheduling plugins.

Increase node utilization and operator control

In this simple scale-out scenario, manually increasing identical CPU-bound replicas from 8 to 30, the default scheduler distributes pods evenly across nodes.

Configurable Scheduler Profiles configured for bin-packing show a visible consolidation pattern that differs from the default scheduler, and improves capacity for new pods. This shift occurs without changing the workload, node size, or autoscaling behavior - only the scheduler’s scoring logic.

While this experiment uses intentionally simple, CPU-bound containers to isolate scheduling behavior, the placement patterns observed here can be applied to GPU-bound workloads where capacity constraints elevate the need for utilization improvements. In this constrained GPU scenario the bin‑packing scheduler plays a critical role in maximizing usable capacity of 4 GPUs spread across 2 nodes. By consolidating single‑GPU workloads onto the same node, the scheduler avoids idle GPUs and enables all four GPUs to be actively used before new nodes are required.

This change in distribution shape enables downstream efficiencies: improved control for platform engineers, efficient resource usage, and cost optimization that are difficult to achieve when pods are evenly spread. Configurable Scheduler has many use cases and each profile expresses a distinct scheduling intent. The next two sections detail how the scoring strategies, MostAllocated and RequestedToCapacityRatio, are configured to achieve increased utilization outcomes.

Scheduler	Scoring strategy	Scheduling intent	Operator benefits
Default scheduler	NodeResourcesFit: LeastAllocated	Balance and hotspot reduction	No tuning
Configurable Scheduler Profile	NodeResourcesFit: MostAllocated	Maximize consolidation / bin‑packing	Maximum node utilization, highest cost reduction potential
Configurable Scheduler Profile	NodeResourcesFit: RequestedToCapacityRatio	Targeted utilization with headroom	✅ Recommended strategy Increased utilization with stronger control over consolidation and burst headroom than `MostAllocated`

Increase AKS CPU utilization

Using RequestedToCapacityRatio, this bin packing profile is configured to favor nodes within a utilization band of 50-85%, avoiding empty nodes, and severely deprioritizing nearly full nodes at 90% utilization or more, to limit oversaturated nodes. Configure node bin-packing using the RequestedToCapacityRatio strategy to improve utilization and reduce infrastructure costs.

RequestedToCapacityRatio scores nodes based on the ratio of requested resources to total node capacity after the pod is hypothetically placed. This strategy enables more fine-grained bin‑packing by allowing operators to define an ideal utilization curve for their nodes rather than simply preferring the most or least utilized nodes.
PodTopologySpread is disabled in this profile because bin-packing and zone-spreading are opposing goals and the scheduling logic may prioritize pod spreading. If you need both high utilization and zone resilience, define a new profile to achieve both goals.

By shaping the scoring curve to target a range of 50-85% CPU utilization, users can increase pod density on provisioned nodes while preserving headroom for bursts, background processes, and system components in CPU-bound workloads.

Given this level of configuration detail, RequestedToCapacityRatio is the recommended scoring strategy for node bin‑packing on AKS for production clusters.

:::note Scoring strategy can also be used for GPU by changing the target resource. Adjust resources, resource weights, utilization thresholds, and plugin parameters to match your VM SKUs, workload patterns, and cluster topology. :::

apiVersion: aks.azure.com/v1alpha1
kind: SchedulerConfiguration
metadata:
  name: upstream
spec:
  rawConfig: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
      - schedulerName: cpu-binpacking-scheduler
        plugins:
          multiPoint:
            enabled:
              - name: NodeResourcesFit
            disabled:
              - name: PodTopologySpread
        pluginConfig:
          - name: NodeResourcesFit
            args:
              apiVersion: kubescheduler.config.k8s.io/v1
              kind: NodeResourcesFitArgs
              scoringStrategy:
                type: RequestedToCapacityRatio
                resources:
                  - name: cpu
                    weight: 8
                  - name: memory
                    weight: 1
                requestedToCapacityRatio:
                  shape:
                    - utilization: 0
                      score: 0
                    - utilization: 30
                      score: 8
                    - utilization: 50
                      score: 10
                    - utilization: 85
                      score: 10
                    - utilization: 90
                      score: 3
                    - utilization: 100
                      score: 0

Increase AKS GPU utilization

When MostAllocated and NodeResourcesBalancedAllocation are combined, the scheduler favors GPU‑bound nodes with balanced CPU and memory usage over nodes with large amounts of unused memory or fragmented resources. This approach reduces fragmented GPU capacity and results in fewer underutilized secondary resources. Configure node bin-packing using the MostAllocated strategy to improve utilization and reduce infrastructure costs.

MostAllocated scores nodes based on its current resource utilization, favoring nodes that are already heavily used for the specified resources.
RequestedToCapacityRatio, lets you define a scoring curve so you can explicitly control preferred utilization ranges and scores nodes based on resource requests relative to the remaining node capacity. This makes MostAllocated more aggressive for consolidation but gives you less explicit control over headroom.
PodTopologySpread is disabled in this profile because bin-packing and zone-spreading are opposing goals. Enabling both can result in spreading over consolidation, weakening the intended packing behavior.

NodeResourcesBalancedAllocation complements MostAllocated because it prefers nodes whose CPU and memory utilization stay proportionally balanced, helping reduce bottlenecks caused by asymmetric resource pressure.

This scheduler configuration maximizes GPU utilization by consolidating smaller jobs onto fewer nodes, reducing idle accelerator capacity while maintaining reasonable CPU and memory balance.

:::note Adjust resources, resource weights, utilization thresholds, and plugin parameters to match your VM SKUs, workload patterns, and cluster topology. :::

apiVersion: aks.azure.com/v1alpha1
kind: SchedulerConfiguration
metadata:
  name: upstream
spec:
  rawConfig: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
      - schedulerName: gpu-node-binpacking-scheduler
        plugins:
          multiPoint:
            enabled:
              - name: NodeResourcesFit
              - name: NodeResourcesBalancedAllocation
            disabled:
              - name: PodTopologySpread
        pluginConfig:
          - name: NodeResourcesFit
            args:
              apiVersion: kubescheduler.config.k8s.io/v1
              kind: NodeResourcesFitArgs
              scoringStrategy:
                type: MostAllocated
                resources:
                  - name: nvidia.com/gpu
                    weight: 8
                  - name: cpu
                    weight: 1
          - name: NodeResourcesBalancedAllocation
            args:
              apiVersion: kubescheduler.config.k8s.io/v1
              kind: NodeResourcesBalancedAllocationArgs
              resources:
                - name: cpu
                  weight: 1
                - name: memory
                  weight: 1

FAQ

Which bin-packing strategy does AKS recommend to increase node utilization?

AKS recommends using the scoring strategy RequestedToCapacityRatio because it provides a more granular scoring approach allowing users to define an ideal utilization curve for their respective nodes. For example, this bin packing strategy allows users to configure a target utilization of 85%.
How do Configurable Scheduler Profiles interact with autoscalers such as Node Auto Provisioning (NAP), Cluster Autoscaler (CA), and Vertical Pod Autoscaler (VPA)?

These components are complementary to each other. Configurable Scheduler Profiles influence how pods are placed on nodes, while autoscalers make scaling decisions based on resource utilization and pending pods. Look out for an upcoming blog that details how scheduling constraints affect Node Auto Provisioning.
- Node Auto Provisioning (NAP) is triggered when pods are unschedulable. If a suitable node already exists, that pod will be scheduled with the defined Configurable Scheduler Profile. If no suitable node exists, NAP provisions new capacity and schedules the pod.
- Cluster Autoscaler (CA) manages node scale-up and scale-down. On scale-up, CA is triggered when there aren't any suitable nodes available for the pending pod. Using Configurable Scheduler Profiles ensures nodes are only scaled when provisioned resources are no longer suitable. On scale-down, CA is triggered when nodes fall below utilization thresholds, the default is 50%. As active nodes are packed more efficiently, underutilized nodes become easier candidates for removal.
- Vertical Pod Autoscaler (VPA) optimizes resource utilization patterns in pods. As pods are recreated with updated CPU and memory requests, they are scheduled using the configured scheduler profile, allowing placement decisions to reflect the new resource requirements.
What if a resource, like memory, is omitted in the scoringStrategy?

If a resource is omitted in the scoringStrategy, then that resource will not be considered in the filter or scoring cycles of the defined Configurable Scheduler Profile. If that resource should be considered, but with a reduced influence on the final score, it can be included with reduced weight.
Can multiple Configurable Scheduler Profiles be used for different workloads on the same cluster?

Yes, multiple scheduling profiles can coexist in a single cluster. This allows different placement strategies (for example, cost‑optimized vs. latency‑sensitive workloads) to run side‑by‑side. Visit the documentation for a multiple scheduler profiles example.
- Multiple profiles can be defined centrally in a single scheduler configuration.
- Individual workloads select a profile via schedulerName in the pod spec.
How do I monitor whether my scheduler profile is improving utilization?

Monitor these signals to confirm that the scheduler is behaving correctly. Over time, you should see higher average node utilization, reduced variance between nodes, and fewer lightly utilized nodes.
- Track node‑level utilization metrics, including CPU and memory utilization per node and distribution of pods across nodes, using Azure Monitor Container Insights, the AKS node viewer tool, or kubectl top nodes for quick validation.
- Review autoscaler outcomes, looking for fewer scale‑ups during normal load and more decisive scale‑downs after demand drops.
- Measure cost metrics, such as reduced idle costs when you use the Cost Analysis add-on.

Next steps: Optimize resources with Configurable Scheduler Profiles on AKS

Configurable Scheduler Profiles give you direct control over pod placement. With these scheduling plugins, your workloads make full use of available GPU capacity, reduce idle costs, and avoid costly overprovisioning.

For additional guidance and best practices, see kube-scheduler best practices
Increase node utilization using Configurable Scheduler Profiles
To schedule and queue batch workloads, install and configure Kueue on AKS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the default Kubernetes scheduler place pods?

Limitations of the default Kubernetes scheduler

Configurable Scheduler Profiles on AKS

Increase node utilization and operator control

Increase AKS CPU utilization

Increase AKS GPU utilization

FAQ

Next steps: Optimize resources with Configurable Scheduler Profiles on AKS

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

How does the default Kubernetes scheduler place pods?

Limitations of the default Kubernetes scheduler

Configurable Scheduler Profiles on AKS

Increase node utilization and operator control

Increase AKS CPU utilization

Increase AKS GPU utilization

FAQ

Next steps: Optimize resources with Configurable Scheduler Profiles on AKS