Skip to content

Commit b96d793

Browse files
wdarko1Copilot
andauthored
Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 0b0b088 commit b96d793

File tree

1 file changed

+10
-11
lines changed
  • website/blog/2026-04-12-nap-disruption

1 file changed

+10
-11
lines changed

website/blog/2026-04-12-nap-disruption/index.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
2-
title: "Managing Disruption with AKS Node Auto-Provisioning (NAP): PDBs, Consolidation, and Disruption Budgets"
3-
description: "Learn AKS best practices to control voluntary disruption from Node Auto-Provisioning (NAP): how Pod Disruption Budgets interact with Karpenter consolidation/drift/expiration, and how to use NodePool disruption budgets and maintenance windows to keep workloads stable."
2+
title: "Managing Disruption with AKS Node Auto-Provisioning"
3+
description: "Learn AKS best practices to control NAP disruption with Pod Disruption Budgets (PDBs), node pool disruption budgets, consolidation, and maintenance windows."
44
date: 2026-04-12
55
authors: ["wilson-darko"]
66
tags:
77
- node-auto-provisioning
88
---
99

1010
## Background
11-
AKS users want to ensure that their workloads scaling when needed, and are disrupted only when (or where) desired.
12-
AKS Node Auto-Provisioning (NAP) is designed to keep clusters efficient: it provisions nodes for pending pods, and it also continuously *removes* nodes when it’s safe to do so (for example, when nodes are empty or underutilized). That second half **disruption** is where many production surprises happen.
11+
AKS users want to ensure that their workloads scale when needed and are disrupted only when (and where) desired.
12+
AKS Node Auto-Provisioning (NAP) is designed to keep clusters efficient: it provisions nodes for pending pods, and it also continuously *removes* nodes when it’s safe to do so (for example, when nodes are empty or underutilized). That node-removal **disruption** is where many production surprises happen.
1313

1414
When managing Kubernetes, operational questions that users might have are:
1515

@@ -19,7 +19,7 @@ When managing Kubernetes, operational questions that users might have are:
1919
- Why do upgrades get “stuck” on certain nodes?
2020

2121

22-
This post focuses on **NAP disruption best practices**, and not workload scheduling (tools like topology spread constraints, node affinity, taints, etc.). For more on scheduling best practices, check out our [blog post](<will edit once part 1 blog is published>).
22+
This post focuses on **NAP disruption best practices**, and not workload scheduling (tools like topology spread constraints, node affinity, taints, etc.). For more on scheduling best practices, check out our earlier blog post on NAP scheduling fundamentals.
2323

2424
If you’re new to these NAP features, this post will give you “good defaults” as a starting point. If you’re already deep into NAP disruption settings, treat it as a checklist for the behaviors AKS users most commonly ask about.
2525

@@ -58,15 +58,15 @@ NAP is built on Karpenter concepts and exposes disruption controls on the **Node
5858
- **Consolidation policy** (when NAP is allowed to consolidate)
5959
- **Disruption budgets** (how many nodes can be disrupted at once, and when)
6060
- **Expire-after** (node lifetime)
61-
- **Drift**(replace nodes that are out o)
61+
- **Drift** (replace nodes that are out of date with the desired NodePool configuration)
6262

6363
A good operational posture is: **use PDBs to protect *applications*** and **use NAP disruption tools to control *the cluster’s disruption rate***.
6464

6565
---
6666

6767
## Part 2 - NAP Overview
6868

69-
Node auto-provisioning (NAP) provisions, scales, and manages nodes. NAP bases it's scheduling and disruption logic on settings from 3 sources:
69+
Node auto-provisioning (NAP) provisions, scales, and manages nodes. NAP bases its scheduling and disruption logic on settings from 3 sources:
7070

7171
- Workload deployment file - For disruption NAP honors the pod disruption budgets defined by the user here
7272
- [NodePool CRD](https://learn.microsoft.com/azure/aks/node-auto-provisioning-node-pools) - Used to list the range of allowed virtual machine options (size, zones, architecture) and also disruption settings
@@ -119,8 +119,6 @@ spec:
119119
app: web
120120
```
121121
122-
Kubernetes describes minAvailable / maxUnavailable as the two key availability knobs, and notes you can only specify one per PDB.
123-
124122
Why it works well in practice:
125123
- Consolidation/drift/expiration can still proceed.
126124
- You avoid large brownouts caused by draining too many replicas at once.
@@ -149,7 +147,8 @@ This can be intentional for extremely sensitive workloads, but it has a cost: if
149147

150148
There are two different operator intents that often get conflated:
151149

152-
- **When** consolidation is allowed and will happen- **How much** disruption can happen concurrently (budgets / rate limiting)
150+
- **When** consolidation is allowed and will happen
151+
- **How much** disruption can happen concurrently (budgets / rate limiting)
153152

154153
### Consolidation policy (when)
155154

@@ -176,7 +175,7 @@ spec:
176175

177176
### Node Disruption budgets (how fast)
178177

179-
NAP exposes Karpenter-style disruption budgets on the NodePool. If you don’t set them, a default budget of `nodes: 10%` is used. Use budgets to regulate how many nodes are consolidate at a time.
178+
NAP exposes Karpenter-style disruption budgets on the NodePool. If you don’t set them, a default budget of `nodes: 10%` is used. Use budgets to regulate how many nodes are consolidated at a time.
180179

181180
The following example sets the node disruption budget to 1 node at a time.
182181

0 commit comments

Comments
 (0)