Skip to content

Commit da6a66d

Browse files
authored
Refine PDB explanations and correct typos
Clarified descriptions of Pod Disruption Budgets and their impact on voluntary evictions. Improved wording for clarity and corrected minor typos.
1 parent b96d793 commit da6a66d

File tree

1 file changed

+10
-5
lines changed
  • website/blog/2026-04-12-nap-disruption

1 file changed

+10
-5
lines changed

website/blog/2026-04-12-nap-disruption/index.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ PDBs are Kubernetes-native guardrails that limit **voluntary evictions** of pods
4545

4646
“During voluntary disruptions, keep at least N replicas available (or limit max unavailable).”
4747

48-
4948
:::note
5049
Pod disruption budgets protect against **voluntary evictions**, not involuntary failures, forced migrations, or spot node eviction.
5150
:::
@@ -98,8 +97,8 @@ PDBs and Karpenter disruption budgets mainly help with **voluntary** disruptions
9897

9998
The most common NAP disruption problems come from PDBs that are either:
10099

101-
- **Too strict**, blocking drains indefinitely, or
102-
- **Missing**, allowing too much disruption at once.
100+
- **Too strict**, too strong of a guardrail blocks node drains indefinitely
101+
- **Missing**, No gaurdrail allows too much disruption at once
103102

104103
### A good default PDB
105104

@@ -120,6 +119,7 @@ spec:
120119
```
121120
122121
Why it works well in practice:
122+
123123
- Consolidation/drift/expiration can still proceed.
124124
- You avoid large brownouts caused by draining too many replicas at once.
125125
- You reduce the chance of NAP “thrashing” a service by repeatedly moving too many pods.
@@ -142,7 +142,6 @@ This can be intentional for extremely sensitive workloads, but it has a cost: if
142142
- For general workloads that can tolerate minor disruption, prefer a small maxUnavailable (like 1) rather than “zero evictions.”
143143
- Be clear on the tradeoff between zero tolerance (blocks upgrades, NAP consolidation, and scale down).
144144

145-
146145
## Part 4 — Controlling consolidation - “when” vs “how fast”
147146

148147
There are two different operator intents that often get conflated:
@@ -156,6 +155,13 @@ Use the NodePool’s consolidation policy to express your comfort level with cos
156155

157156
Consolidation Settings
158157

158+
- `ConsolidationPolicy: WhenEmptyOrUnderutilized` - Triggered when NAP identifies that the existing nodes are underutilized (or empty). This is determined by NAP running cost simulations of combination of VM sizes will best match the currently configuration. Once one combination is found, this triggers consolidation.
159+
- `ConsolidateAfter: 1d` - time-based setting that ontrols the delay before NAP consolidates nodes that are underutilized, working in conjunction with the `consolidationPolicy` setting
160+
- `expireAfter: 24hr` - time-based setting that determines how long nodes defined in this NodePool CRD are allowed to exist. Any olders nodes will be deleted, regardless of Consolidation Policies.
161+
162+
_NOTE:_ - How NAP defines "Underutilized" is not currently a value that can be set by users, is is determined by the cost simulation runs by NAP.
163+
164+
The following example showed these disruption tools in action:
159165

160166
```yaml
161167
apiVersion: karpenter.sh/v1
@@ -172,7 +178,6 @@ spec:
172178
expireAfter: Never
173179
```
174180

175-
176181
### Node Disruption budgets (how fast)
177182

178183
NAP exposes Karpenter-style disruption budgets on the NodePool. If you don’t set them, a default budget of `nodes: 10%` is used. Use budgets to regulate how many nodes are consolidated at a time.

0 commit comments

Comments
 (0)