You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/blog/2026-04-12-nap-disruption/index.md
+10-5Lines changed: 10 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,6 @@ PDBs are Kubernetes-native guardrails that limit **voluntary evictions** of pods
45
45
46
46
“During voluntary disruptions, keep at least N replicas available (or limit max unavailable).”
47
47
48
-
49
48
:::note
50
49
Pod disruption budgets protect against **voluntary evictions**, not involuntary failures, forced migrations, or spot node eviction.
51
50
:::
@@ -98,8 +97,8 @@ PDBs and Karpenter disruption budgets mainly help with **voluntary** disruptions
98
97
99
98
The most common NAP disruption problems come from PDBs that are either:
100
99
101
-
-**Too strict**, blocking drains indefinitely, or
102
-
-**Missing**, allowing too much disruption at once.
100
+
-**Too strict**, too strong of a guardrail blocks node drains indefinitely
101
+
-**Missing**, No gaurdrail allows too much disruption at once
103
102
104
103
### A good default PDB
105
104
@@ -120,6 +119,7 @@ spec:
120
119
```
121
120
122
121
Why it works well in practice:
122
+
123
123
- Consolidation/drift/expiration can still proceed.
124
124
- You avoid large brownouts caused by draining too many replicas at once.
125
125
- You reduce the chance of NAP “thrashing” a service by repeatedly moving too many pods.
@@ -142,7 +142,6 @@ This can be intentional for extremely sensitive workloads, but it has a cost: if
142
142
- For general workloads that can tolerate minor disruption, prefer a small maxUnavailable (like 1) rather than “zero evictions.”
143
143
- Be clear on the tradeoff between zero tolerance (blocks upgrades, NAP consolidation, and scale down).
144
144
145
-
146
145
## Part 4 — Controlling consolidation - “when” vs “how fast”
147
146
148
147
There are two different operator intents that often get conflated:
@@ -156,6 +155,13 @@ Use the NodePool’s consolidation policy to express your comfort level with cos
156
155
157
156
Consolidation Settings
158
157
158
+
- `ConsolidationPolicy: WhenEmptyOrUnderutilized` - Triggered when NAP identifies that the existing nodes are underutilized (or empty). This is determined by NAP running cost simulations of combination of VM sizes will best match the currently configuration. Once one combination is found, this triggers consolidation.
159
+
- `ConsolidateAfter: 1d` - time-based setting that ontrols the delay before NAP consolidates nodes that are underutilized, working in conjunction with the `consolidationPolicy` setting
160
+
- `expireAfter: 24hr` - time-based setting that determines how long nodes defined in this NodePool CRD are allowed to exist. Any olders nodes will be deleted, regardless of Consolidation Policies.
161
+
162
+
_NOTE:_ - How NAP defines "Underutilized" is not currently a value that can be set by users, is is determined by the cost simulation runs by NAP.
163
+
164
+
The following example showed these disruption tools in action:
159
165
160
166
```yaml
161
167
apiVersion: karpenter.sh/v1
@@ -172,7 +178,6 @@ spec:
172
178
expireAfter: Never
173
179
```
174
180
175
-
176
181
### Node Disruption budgets (how fast)
177
182
178
183
NAP exposes Karpenter-style disruption budgets on the NodePool. If you don’t set them, a default budget of `nodes: 10%` is used. Use budgets to regulate how many nodes are consolidated at a time.
0 commit comments