|
| 1 | +--- |
| 2 | +title: "Managing Disruption with AKS Node Auto-Provisioning (NAP): PDBs, Consolidation, and Disruption Budgets" |
| 3 | +description: "Learn AKS best practices to control voluntary disruption from Node Auto-Provisioning (NAP): how Pod Disruption Budgets interact with Karpenter consolidation/drift/expiration, and how to use NodePool disruption budgets and maintenance windows to keep workloads stable." |
| 4 | +date: 2026-04-12 |
| 5 | +authors: ["wilson-darko"] |
| 6 | +tags: |
| 7 | + - node-auto-provisioning |
| 8 | +--- |
| 9 | + |
| 10 | +## Background |
| 11 | +AKS users want to ensure that their workloads scaling when needed, and are disrupted only when (or where) desired. |
| 12 | +AKS Node Auto-Provisioning (NAP) is designed to keep clusters efficient: it provisions nodes for pending pods, and it also continuously *removes* nodes when it’s safe to do so (for example, when nodes are empty or underutilized). That second half **disruption** is where many production surprises happen. |
| 13 | + |
| 14 | +When managing Kubernetes, operational questions that users might have are: |
| 15 | + |
| 16 | +- How do I control when scale downs happen, or where it shouldn't? |
| 17 | +- How do I control workload disruption so it happens predictably (and not in the middle of business hours)? |
| 18 | +- Why won’t NAP scale down, even though I have lots of underused capacity? |
| 19 | +- Why do upgrades get “stuck” on certain nodes? |
| 20 | + |
| 21 | + |
| 22 | +This post focuses on **NAP disruption best practices**, and not workload scheduling (tools like topology spread constraints, node affinity, taints, etc.). For more on scheduling best practices, check out our [blog post](<will edit once part 1 blog is published>). |
| 23 | + |
| 24 | +If you’re new to these NAP features, this post will give you “good defaults” as a starting point. If you’re already deep into NAP disruption settings, treat it as a checklist for the behaviors AKS users most commonly ask about. |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +<!-- truncate --> |
| 29 | + |
| 30 | +:::info |
| 31 | + |
| 32 | +Learn more about how to [configure disruption policies for NAP](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption) |
| 33 | + |
| 34 | +::: |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## Part 1 — The mental model: two layers of disruption control |
| 39 | + |
| 40 | +When NAP decides a node (virtual machine) *could* be removed, there are two layers of controls that determine whether it actually happens: |
| 41 | + |
| 42 | +### Workload layer: Pod Disruption Budgets (PDBs) |
| 43 | + |
| 44 | +PDBs are Kubernetes-native guardrails that limit **voluntary evictions** of pods. PDBs are how you tell Kubernetes: |
| 45 | + |
| 46 | +“During voluntary disruptions, keep at least N replicas available (or limit max unavailable).” |
| 47 | + |
| 48 | + |
| 49 | +:::note |
| 50 | +Pod disruption budgets protect against **voluntary evictions**, not involuntary failures, forced migrations, or spot node eviction. |
| 51 | +::: |
| 52 | + |
| 53 | +### Infrastructure layer: Node-level disruption settings |
| 54 | + |
| 55 | +NAP allows setting disruption settings at the node level |
| 56 | + |
| 57 | +NAP is built on Karpenter concepts and exposes disruption controls on the **NodePool**: |
| 58 | +- **Consolidation policy** (when NAP is allowed to consolidate) |
| 59 | +- **Disruption budgets** (how many nodes can be disrupted at once, and when) |
| 60 | +- **Expire-after** (node lifetime) |
| 61 | +- **Drift**(replace nodes that are out o) |
| 62 | + |
| 63 | +A good operational posture is: **use PDBs to protect *applications*** and **use NAP disruption tools to control *the cluster’s disruption rate***. |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +## Part 2 - NAP Overview |
| 68 | + |
| 69 | +Node auto-provisioning (NAP) provisions, scales, and manages nodes. NAP bases it's scheduling and disruption logic on settings from 3 sources: |
| 70 | + |
| 71 | +- Workload deployment file - For disruption NAP honors the pod disruption budgets defined by the user here |
| 72 | +- [NodePool CRD](https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools) - Used to list the range of allowed virtual machine options (size, zones, architecture) and also disruption settings |
| 73 | +- [AKSNodeClass CRD](https://learn.microsoft.com/azure/aks/node-auto-provisioning-aksnodeclass) - Used to define Azure-specific settings |
| 74 | + |
| 75 | +### How NAP handles disruption |
| 76 | + |
| 77 | +NAP honors Kubernetes-native concepts such as Pod Disruption Budgets when making disruption decisions. NAP also has Karpenter-based concepts such as Consolidation, Drift, and Node Disruption Budgets. |
| 78 | + |
| 79 | +#### What “disruption” means in NAP (and what it doesn’t) |
| 80 | + |
| 81 | +In NAP, “disruption” typically refers to **voluntary** actions that delete nodes after draining them, such as: |
| 82 | + |
| 83 | +- **Consolidation**: deleting or replacing nodes (with better VM sizes) to increase compute efficiency (and reduce cost). |
| 84 | +- **Drift**: replacing existing nodes that no longer match desired configuration (for example, an updated settings in your NodePool and AKSNodeClass CRDs). |
| 85 | +- **Expiration**: replacing nodes after a configured lifetime. |
| 86 | + |
| 87 | +These are different from **involuntary** disruptions such as: |
| 88 | + |
| 89 | +- Spot/eviction events |
| 90 | +- Hardware failures |
| 91 | +- Host reboots outside your control |
| 92 | + |
| 93 | +PDBs and Karpenter disruption budgets mainly help with **voluntary** disruptions. These features do not regulate involuntary disruption (for example, spot VM evictions, node termination events, node stopping events). |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +## Part 3 — Pod Disruption Budgets (PDBs): controlling voluntary disruption |
| 98 | + |
| 99 | +The most common NAP disruption problems come from PDBs that are either: |
| 100 | + |
| 101 | +- **Too strict**, blocking drains indefinitely, or |
| 102 | +- **Missing**, allowing too much disruption at once. |
| 103 | + |
| 104 | +### A good default PDB |
| 105 | + |
| 106 | +Kubernetes documentation describes minAvailable / maxUnavailable as the two key availability knobs for PDBs, and notes you can only specify one per PDB. |
| 107 | + |
| 108 | +Here's an example of a PDB that regulates disruption without blocking scale downs, upgrades, and consolidation: |
| 109 | + |
| 110 | +```yaml |
| 111 | +apiVersion: policy/v1 |
| 112 | +kind: PodDisruptionBudget |
| 113 | +metadata: |
| 114 | + name: web-pdb |
| 115 | +spec: |
| 116 | + maxUnavailable: 1 |
| 117 | + selector: |
| 118 | + matchLabels: |
| 119 | + app: web |
| 120 | +``` |
| 121 | +
|
| 122 | +Kubernetes describes minAvailable / maxUnavailable as the two key availability knobs, and notes you can only specify one per PDB. |
| 123 | +
|
| 124 | +Why it works well in practice: |
| 125 | +- Consolidation/drift/expiration can still proceed. |
| 126 | +- You avoid large brownouts caused by draining too many replicas at once. |
| 127 | +- You reduce the chance of NAP “thrashing” a service by repeatedly moving too many pods. |
| 128 | +
|
| 129 | +### The common PDB pitfall: “zero voluntary evictions” |
| 130 | +
|
| 131 | +If you effectively set zero voluntary evictions (`maxUnavailable: 0` or `minAvailable: 100%`), Kubernetes warns this can block node drains indefinitely for a node running one of those pods. |
| 132 | + |
| 133 | +This common misconfiguration can cause scenarios such as: |
| 134 | + |
| 135 | +- Node / Cluster upgrades fail as nodes won't voluntarily scale down |
| 136 | +- Migration fails |
| 137 | +- NAP Consolidation never happens |
| 138 | + |
| 139 | +This can be intentional for extremely sensitive workloads, but it has a cost: if a node has one of these pods, draining that node can become impossible without changing the PDB (or taking an outage). We recommend setting some tolerance for their two settings, and also using disruption budgets or maintenance windows to control disruption. |
| 140 | + |
| 141 | +**Practical guidance:** |
| 142 | + |
| 143 | +- For critical workloads that you do not want to be disrupted at all, strictness of "zero eviction" may be intentional — but be deliberate. When you're ready to allow disruption to these workloads, you may have to change the PDBs in the workload deployment file. |
| 144 | +- For general workloads that can tolerate minor disruption, prefer a small maxUnavailable (like 1) rather than “zero evictions.” |
| 145 | +- Be clear on the tradeoff between zero tolerance (blocks upgrades, NAP consolidation, and scale down). |
| 146 | + |
| 147 | + |
| 148 | +## Part 4 — Controlling consolidation - “when” vs “how fast” |
| 149 | + |
| 150 | +There are two different operator intents that often get conflated: |
| 151 | + |
| 152 | +- **When** consolidation is allowed and will happen- **How much** disruption can happen concurrently (budgets / rate limiting) |
| 153 | + |
| 154 | +### Consolidation policy (when) |
| 155 | + |
| 156 | +Use the NodePool’s consolidation policy to express your comfort level with cost-optimization moves. For many clusters, a safe baseline is “only consolidate when empty or underutilized,” and then use budgets to keep the pace controlled. |
| 157 | + |
| 158 | +Consolidation Settings |
| 159 | + |
| 160 | + |
| 161 | +```yaml |
| 162 | +apiVersion: karpenter.sh/v1 |
| 163 | +kind: NodePool |
| 164 | +metadata: |
| 165 | + name: default |
| 166 | +spec: |
| 167 | + disruption: |
| 168 | + consolidationPolicy: WhenEmptyOrUnderutilized |
| 169 | + template: |
| 170 | + spec: |
| 171 | + nodeClassRef: |
| 172 | + name: default |
| 173 | + expireAfter: Never |
| 174 | +``` |
| 175 | + |
| 176 | + |
| 177 | +### Node Disruption budgets (how fast) |
| 178 | + |
| 179 | +NAP exposes Karpenter-style disruption budgets on the NodePool. If you don’t set them, a default budget of `nodes: 10%` is used. Use budgets to regulate how many nodes are consolidate at a time. |
| 180 | + |
| 181 | +The following example sets the node disruption budget to 1 node at a time. |
| 182 | + |
| 183 | +```yaml |
| 184 | +apiVersion: karpenter.sh/v1 |
| 185 | +kind: NodePool |
| 186 | +metadata: |
| 187 | + name: default |
| 188 | +spec: |
| 189 | + disruption: |
| 190 | + budgets: |
| 191 | + - nodes: "1" |
| 192 | +``` |
| 193 | + |
| 194 | +This is often the simplest way to prevent “NAP moved too many nodes at once”. |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## Part 5 — Maintenance windows |
| 199 | + |
| 200 | +A good practice for managing disruption is to **allow some consolidation, but only during a specific time-window**. |
| 201 | + |
| 202 | +NAP node disruption budgets support `schedule` and `duration` so you can create time-based rules (cron syntax). These node disruption budgets can be defined by setting the `spec.disruption.budgets` field in the [NodePool CRD](https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools) |
| 203 | + |
| 204 | +For example, block disruptions during business hours: |
| 205 | + |
| 206 | +```yaml |
| 207 | +budgets: |
| 208 | +- nodes: "0" |
| 209 | + schedule: "0 9 * * 1-5" # 9 AM Monday-Friday |
| 210 | + duration: 8h |
| 211 | +``` |
| 212 | + |
| 213 | +Or allow higher disruption on weekends, and block otherwise: |
| 214 | + |
| 215 | +```yaml |
| 216 | +budgets: |
| 217 | +- nodes: "50%" |
| 218 | + schedule: "0 0 * * 6" # Saturday midnight |
| 219 | + duration: 48h |
| 220 | +- nodes: "0" |
| 221 | +``` |
| 222 | + |
| 223 | +**Why this matters:** it aligns cost-optimization (consolidation/drift/expiration) and updates with the regulated timeline that works for your workload needs. |
| 224 | + |
| 225 | +To learn more about node disruption budgets, visit our [NAP Disruption documentation](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption#disruption-budgets) |
| 226 | + |
| 227 | +--- |
| 228 | + |
| 229 | +## Part 6 — Don’t forget node image updates (drift) and the “90-day” reality |
| 230 | + |
| 231 | +NAP nodes are regularly updated as images change. The node image updates doc calls out a key behavior: **if a node image version is older than 90 days, NAP forces pickup of the latest image version, bypassing any existing maintenance window**. |
| 232 | + |
| 233 | +Operational takeaway: |
| 234 | +- Set up maintenance windows and budgets, but also ensure you’re not drifting so long that you hit a forced-update scenario. |
| 235 | +- Treat “keep nodes reasonably fresh” as part of disruption planning, not an afterthought. |
| 236 | + |
| 237 | +--- |
| 238 | + |
| 239 | +## Part 7 — Observability: verify disruption decisions with events/logs |
| 240 | + |
| 241 | +Before changing policies, confirm what NAP *thinks* it’s doing: |
| 242 | + |
| 243 | +- View events: |
| 244 | + - `kubectl get events --field-selector source=karpenter-events` |
| 245 | +- Or use AKS control plane logs in Log Analytics (filter for `karpenter-events`) |
| 246 | + |
| 247 | +This helps distinguish: |
| 248 | +- “NAP wants to disrupt but is blocked by PDBs / budgets” |
| 249 | +from |
| 250 | +- “NAP isn’t trying to disrupt because consolidation policy doesn’t allow it” |
| 251 | +from |
| 252 | +- “NAP can’t replace nodes because provisioning is failing” |
| 253 | + |
| 254 | +--- |
| 255 | + |
| 256 | +## Common disruption pitfalls |
| 257 | + |
| 258 | +### Symptom: NAP won’t consolidate / drains hang forever |
| 259 | + |
| 260 | +**Likely cause** |
| 261 | +- PDBs effectively allow zero voluntary evictions (`maxUnavailable: 0` / `minAvailable: 100%`), or |
| 262 | +- Too few replicas to satisfy the PDB during drain. |
| 263 | + |
| 264 | +**Fix** |
| 265 | +- Relax PDBs (for example `maxUnavailable: 1`) or increase replicas. |
| 266 | +- If a workload truly must be undisruptable, accept that nodes running it won’t be good consolidation targets. |
| 267 | + |
| 268 | +### Symptom: NAP disrupts too often or too much at once |
| 269 | + |
| 270 | +Behavior: NAP consolidates too often or voluntarily disrupts too many nodes at once |
| 271 | +Cause: User has not set any guardrails on node disruption behavior. |
| 272 | + |
| 273 | +**Fix** |
| 274 | +- Add PDBs that regulate disruption pace |
| 275 | +- Add NodePool disruption budgets (start with `nodes: "1"` or a small percentage). |
| 276 | +- Add time-based budgets (maintenance windows) so disruption happens when you want it. |
| 277 | +
|
| 278 | +### Symptom: disruption happens at the wrong time |
| 279 | +
|
| 280 | +**Likely cause** |
| 281 | +- No time-based budgets / maintenance window. |
| 282 | +
|
| 283 | +**Fix** |
| 284 | +- Add `schedule` + `duration` budgets to block disruption during business hours. |
| 285 | +- Combine “block window” with a “small allowed disruption” budget outside the window. |
| 286 | + |
| 287 | +#### Common pitfalls for NAP disruption |
| 288 | + |
| 289 | +Behavior: NAP consolidates too often or voluntarily disrupts too many nodes at once |
| 290 | +Cause: User has not set any guardrails on node disruption behavior. |
| 291 | + |
| 292 | +- Fix: Add PDBs that regulate disruption pace |
| 293 | +- Fix: Consider adding [Consolidation Policies](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption) |
| 294 | +- Fix: Configure [Node Disruption Budgets](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption#disruption-budgets) and/or enable a Maintenance Window using the [AKS Node OS Maintenance Schedule](https://learn.microsoft.com/azure/aks/node-auto-provisioning-upgrade-image#node-os-upgrade-maintenance-windows-for-nap) |
| 295 | + |
| 296 | +Behavior: NAP node upgrades fail and/or NAP nodes will not scale down voluntarily |
| 297 | +Cause: PDBs are set too strictly (for example, `maxUnavailable = 0` or `minAvailable: 100%`) |
| 298 | + |
| 299 | +- Fix: Ensure PDBs are not too strict; set maxUnavailable to a low (but not 0) number like 1. |
| 300 | + |
| 301 | +_**Note:**_ This section is describing voluntary disruption, not to be confused with involuntary eviction (for example, spot VM evictions, node termination events, node stopping events) |
| 302 | + |
| 303 | +--- |
| 304 | + |
| 305 | +## Next steps |
| 306 | + |
| 307 | +1. **Try NAP today:** Check out the [Enable Node Auto Provisioning steps](https://learn.microsoft.com/azure/aks/use-node-auto-provisioning). |
| 308 | +1. **Learn more:** Visit our AKS [operator best-practices guidance](https://learn.microsoft.com/azure/aks/operator-best-practices-advanced-scheduler) |
| 309 | +1. **Share feedback:** Open issues or ideas in [AKS GitHub Issues](https://github.com/Azure/AKS/issues). |
| 310 | +1. **Join the community:** Subscribe to the [AKS Community YouTube](https://www.youtube.com/@theakscommunity) and follow [@theakscommunity](https://x.com/theakscommunity) on X. |
0 commit comments