Skip to content

Commit 0b0b088

Browse files
authored
Create blog post on AKS NAP disruption management
Added a blog post on managing disruption with AKS Node Auto-Provisioning, covering best practices for Pod Disruption Budgets and consolidation.
1 parent e359164 commit 0b0b088

File tree

1 file changed

+310
-0
lines changed
  • website/blog/2026-04-12-nap-disruption

1 file changed

+310
-0
lines changed
Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
---
2+
title: "Managing Disruption with AKS Node Auto-Provisioning (NAP): PDBs, Consolidation, and Disruption Budgets"
3+
description: "Learn AKS best practices to control voluntary disruption from Node Auto-Provisioning (NAP): how Pod Disruption Budgets interact with Karpenter consolidation/drift/expiration, and how to use NodePool disruption budgets and maintenance windows to keep workloads stable."
4+
date: 2026-04-12
5+
authors: ["wilson-darko"]
6+
tags:
7+
- node-auto-provisioning
8+
---
9+
10+
## Background
11+
AKS users want to ensure that their workloads scaling when needed, and are disrupted only when (or where) desired.
12+
AKS Node Auto-Provisioning (NAP) is designed to keep clusters efficient: it provisions nodes for pending pods, and it also continuously *removes* nodes when it’s safe to do so (for example, when nodes are empty or underutilized). That second half **disruption** is where many production surprises happen.
13+
14+
When managing Kubernetes, operational questions that users might have are:
15+
16+
- How do I control when scale downs happen, or where it shouldn't?
17+
- How do I control workload disruption so it happens predictably (and not in the middle of business hours)?
18+
- Why won’t NAP scale down, even though I have lots of underused capacity?
19+
- Why do upgrades get “stuck” on certain nodes?
20+
21+
22+
This post focuses on **NAP disruption best practices**, and not workload scheduling (tools like topology spread constraints, node affinity, taints, etc.). For more on scheduling best practices, check out our [blog post](<will edit once part 1 blog is published>).
23+
24+
If you’re new to these NAP features, this post will give you “good defaults” as a starting point. If you’re already deep into NAP disruption settings, treat it as a checklist for the behaviors AKS users most commonly ask about.
25+
26+
---
27+
28+
<!-- truncate -->
29+
30+
:::info
31+
32+
Learn more about how to [configure disruption policies for NAP](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption)
33+
34+
:::
35+
36+
---
37+
38+
## Part 1 — The mental model: two layers of disruption control
39+
40+
When NAP decides a node (virtual machine) *could* be removed, there are two layers of controls that determine whether it actually happens:
41+
42+
### Workload layer: Pod Disruption Budgets (PDBs)
43+
44+
PDBs are Kubernetes-native guardrails that limit **voluntary evictions** of pods. PDBs are how you tell Kubernetes:
45+
46+
“During voluntary disruptions, keep at least N replicas available (or limit max unavailable).”
47+
48+
49+
:::note
50+
Pod disruption budgets protect against **voluntary evictions**, not involuntary failures, forced migrations, or spot node eviction.
51+
:::
52+
53+
### Infrastructure layer: Node-level disruption settings
54+
55+
NAP allows setting disruption settings at the node level
56+
57+
NAP is built on Karpenter concepts and exposes disruption controls on the **NodePool**:
58+
- **Consolidation policy** (when NAP is allowed to consolidate)
59+
- **Disruption budgets** (how many nodes can be disrupted at once, and when)
60+
- **Expire-after** (node lifetime)
61+
- **Drift**(replace nodes that are out o)
62+
63+
A good operational posture is: **use PDBs to protect *applications*** and **use NAP disruption tools to control *the cluster’s disruption rate***.
64+
65+
---
66+
67+
## Part 2 - NAP Overview
68+
69+
Node auto-provisioning (NAP) provisions, scales, and manages nodes. NAP bases it's scheduling and disruption logic on settings from 3 sources:
70+
71+
- Workload deployment file - For disruption NAP honors the pod disruption budgets defined by the user here
72+
- [NodePool CRD](https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools) - Used to list the range of allowed virtual machine options (size, zones, architecture) and also disruption settings
73+
- [AKSNodeClass CRD](https://learn.microsoft.com/azure/aks/node-auto-provisioning-aksnodeclass) - Used to define Azure-specific settings
74+
75+
### How NAP handles disruption
76+
77+
NAP honors Kubernetes-native concepts such as Pod Disruption Budgets when making disruption decisions. NAP also has Karpenter-based concepts such as Consolidation, Drift, and Node Disruption Budgets.
78+
79+
#### What “disruption” means in NAP (and what it doesn’t)
80+
81+
In NAP, “disruption” typically refers to **voluntary** actions that delete nodes after draining them, such as:
82+
83+
- **Consolidation**: deleting or replacing nodes (with better VM sizes) to increase compute efficiency (and reduce cost).
84+
- **Drift**: replacing existing nodes that no longer match desired configuration (for example, an updated settings in your NodePool and AKSNodeClass CRDs).
85+
- **Expiration**: replacing nodes after a configured lifetime.
86+
87+
These are different from **involuntary** disruptions such as:
88+
89+
- Spot/eviction events
90+
- Hardware failures
91+
- Host reboots outside your control
92+
93+
PDBs and Karpenter disruption budgets mainly help with **voluntary** disruptions. These features do not regulate involuntary disruption (for example, spot VM evictions, node termination events, node stopping events).
94+
95+
---
96+
97+
## Part 3 — Pod Disruption Budgets (PDBs): controlling voluntary disruption
98+
99+
The most common NAP disruption problems come from PDBs that are either:
100+
101+
- **Too strict**, blocking drains indefinitely, or
102+
- **Missing**, allowing too much disruption at once.
103+
104+
### A good default PDB
105+
106+
Kubernetes documentation describes minAvailable / maxUnavailable as the two key availability knobs for PDBs, and notes you can only specify one per PDB.
107+
108+
Here's an example of a PDB that regulates disruption without blocking scale downs, upgrades, and consolidation:
109+
110+
```yaml
111+
apiVersion: policy/v1
112+
kind: PodDisruptionBudget
113+
metadata:
114+
name: web-pdb
115+
spec:
116+
maxUnavailable: 1
117+
selector:
118+
matchLabels:
119+
app: web
120+
```
121+
122+
Kubernetes describes minAvailable / maxUnavailable as the two key availability knobs, and notes you can only specify one per PDB.
123+
124+
Why it works well in practice:
125+
- Consolidation/drift/expiration can still proceed.
126+
- You avoid large brownouts caused by draining too many replicas at once.
127+
- You reduce the chance of NAP “thrashing” a service by repeatedly moving too many pods.
128+
129+
### The common PDB pitfall: “zero voluntary evictions”
130+
131+
If you effectively set zero voluntary evictions (`maxUnavailable: 0` or `minAvailable: 100%`), Kubernetes warns this can block node drains indefinitely for a node running one of those pods.
132+
133+
This common misconfiguration can cause scenarios such as:
134+
135+
- Node / Cluster upgrades fail as nodes won't voluntarily scale down
136+
- Migration fails
137+
- NAP Consolidation never happens
138+
139+
This can be intentional for extremely sensitive workloads, but it has a cost: if a node has one of these pods, draining that node can become impossible without changing the PDB (or taking an outage). We recommend setting some tolerance for their two settings, and also using disruption budgets or maintenance windows to control disruption.
140+
141+
**Practical guidance:**
142+
143+
- For critical workloads that you do not want to be disrupted at all, strictness of "zero eviction" may be intentional — but be deliberate. When you're ready to allow disruption to these workloads, you may have to change the PDBs in the workload deployment file.
144+
- For general workloads that can tolerate minor disruption, prefer a small maxUnavailable (like 1) rather than “zero evictions.”
145+
- Be clear on the tradeoff between zero tolerance (blocks upgrades, NAP consolidation, and scale down).
146+
147+
148+
## Part 4 — Controlling consolidation - “when” vs “how fast”
149+
150+
There are two different operator intents that often get conflated:
151+
152+
- **When** consolidation is allowed and will happen- **How much** disruption can happen concurrently (budgets / rate limiting)
153+
154+
### Consolidation policy (when)
155+
156+
Use the NodePool’s consolidation policy to express your comfort level with cost-optimization moves. For many clusters, a safe baseline is “only consolidate when empty or underutilized,” and then use budgets to keep the pace controlled.
157+
158+
Consolidation Settings
159+
160+
161+
```yaml
162+
apiVersion: karpenter.sh/v1
163+
kind: NodePool
164+
metadata:
165+
name: default
166+
spec:
167+
disruption:
168+
consolidationPolicy: WhenEmptyOrUnderutilized
169+
template:
170+
spec:
171+
nodeClassRef:
172+
name: default
173+
expireAfter: Never
174+
```
175+
176+
177+
### Node Disruption budgets (how fast)
178+
179+
NAP exposes Karpenter-style disruption budgets on the NodePool. If you don’t set them, a default budget of `nodes: 10%` is used. Use budgets to regulate how many nodes are consolidate at a time.
180+
181+
The following example sets the node disruption budget to 1 node at a time.
182+
183+
```yaml
184+
apiVersion: karpenter.sh/v1
185+
kind: NodePool
186+
metadata:
187+
name: default
188+
spec:
189+
disruption:
190+
budgets:
191+
- nodes: "1"
192+
```
193+
194+
This is often the simplest way to prevent “NAP moved too many nodes at once”.
195+
196+
---
197+
198+
## Part 5 — Maintenance windows
199+
200+
A good practice for managing disruption is to **allow some consolidation, but only during a specific time-window**.
201+
202+
NAP node disruption budgets support `schedule` and `duration` so you can create time-based rules (cron syntax). These node disruption budgets can be defined by setting the `spec.disruption.budgets` field in the [NodePool CRD](https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools)
203+
204+
For example, block disruptions during business hours:
205+
206+
```yaml
207+
budgets:
208+
- nodes: "0"
209+
schedule: "0 9 * * 1-5" # 9 AM Monday-Friday
210+
duration: 8h
211+
```
212+
213+
Or allow higher disruption on weekends, and block otherwise:
214+
215+
```yaml
216+
budgets:
217+
- nodes: "50%"
218+
schedule: "0 0 * * 6" # Saturday midnight
219+
duration: 48h
220+
- nodes: "0"
221+
```
222+
223+
**Why this matters:** it aligns cost-optimization (consolidation/drift/expiration) and updates with the regulated timeline that works for your workload needs.
224+
225+
To learn more about node disruption budgets, visit our [NAP Disruption documentation](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption#disruption-budgets)
226+
227+
---
228+
229+
## Part 6 — Don’t forget node image updates (drift) and the “90-day” reality
230+
231+
NAP nodes are regularly updated as images change. The node image updates doc calls out a key behavior: **if a node image version is older than 90 days, NAP forces pickup of the latest image version, bypassing any existing maintenance window**.
232+
233+
Operational takeaway:
234+
- Set up maintenance windows and budgets, but also ensure you’re not drifting so long that you hit a forced-update scenario.
235+
- Treat “keep nodes reasonably fresh” as part of disruption planning, not an afterthought.
236+
237+
---
238+
239+
## Part 7 — Observability: verify disruption decisions with events/logs
240+
241+
Before changing policies, confirm what NAP *thinks* it’s doing:
242+
243+
- View events:
244+
- `kubectl get events --field-selector source=karpenter-events`
245+
- Or use AKS control plane logs in Log Analytics (filter for `karpenter-events`)
246+
247+
This helps distinguish:
248+
- “NAP wants to disrupt but is blocked by PDBs / budgets”
249+
from
250+
- “NAP isn’t trying to disrupt because consolidation policy doesn’t allow it”
251+
from
252+
- “NAP can’t replace nodes because provisioning is failing”
253+
254+
---
255+
256+
## Common disruption pitfalls
257+
258+
### Symptom: NAP won’t consolidate / drains hang forever
259+
260+
**Likely cause**
261+
- PDBs effectively allow zero voluntary evictions (`maxUnavailable: 0` / `minAvailable: 100%`), or
262+
- Too few replicas to satisfy the PDB during drain.
263+
264+
**Fix**
265+
- Relax PDBs (for example `maxUnavailable: 1`) or increase replicas.
266+
- If a workload truly must be undisruptable, accept that nodes running it won’t be good consolidation targets.
267+
268+
### Symptom: NAP disrupts too often or too much at once
269+
270+
Behavior: NAP consolidates too often or voluntarily disrupts too many nodes at once
271+
Cause: User has not set any guardrails on node disruption behavior.
272+
273+
**Fix**
274+
- Add PDBs that regulate disruption pace
275+
- Add NodePool disruption budgets (start with `nodes: "1"` or a small percentage).
276+
- Add time-based budgets (maintenance windows) so disruption happens when you want it.
277+
278+
### Symptom: disruption happens at the wrong time
279+
280+
**Likely cause**
281+
- No time-based budgets / maintenance window.
282+
283+
**Fix**
284+
- Add `schedule` + `duration` budgets to block disruption during business hours.
285+
- Combine “block window” with a “small allowed disruption” budget outside the window.
286+
287+
#### Common pitfalls for NAP disruption
288+
289+
Behavior: NAP consolidates too often or voluntarily disrupts too many nodes at once
290+
Cause: User has not set any guardrails on node disruption behavior.
291+
292+
- Fix: Add PDBs that regulate disruption pace
293+
- Fix: Consider adding [Consolidation Policies](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption)
294+
- Fix: Configure [Node Disruption Budgets](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption#disruption-budgets) and/or enable a Maintenance Window using the [AKS Node OS Maintenance Schedule](https://learn.microsoft.com/azure/aks/node-auto-provisioning-upgrade-image#node-os-upgrade-maintenance-windows-for-nap)
295+
296+
Behavior: NAP node upgrades fail and/or NAP nodes will not scale down voluntarily
297+
Cause: PDBs are set too strictly (for example, `maxUnavailable = 0` or `minAvailable: 100%`)
298+
299+
- Fix: Ensure PDBs are not too strict; set maxUnavailable to a low (but not 0) number like 1.
300+
301+
_**Note:**_ This section is describing voluntary disruption, not to be confused with involuntary eviction (for example, spot VM evictions, node termination events, node stopping events)
302+
303+
---
304+
305+
## Next steps
306+
307+
1. **Try NAP today:** Check out the [Enable Node Auto Provisioning steps](https://learn.microsoft.com/azure/aks/use-node-auto-provisioning).
308+
1. **Learn more:** Visit our AKS [operator best-practices guidance](https://learn.microsoft.com/azure/aks/operator-best-practices-advanced-scheduler)
309+
1. **Share feedback:** Open issues or ideas in [AKS GitHub Issues](https://github.com/Azure/AKS/issues).
310+
1. **Join the community:** Subscribe to the [AKS Community YouTube](https://www.youtube.com/@theakscommunity) and follow [@theakscommunity](https://x.com/theakscommunity) on X.

0 commit comments

Comments
 (0)