You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+46Lines changed: 46 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,13 +53,15 @@ The following options can be used to customize the k8s-shredder controller:
53
53
| ToBeDeletedTaint | "ToBeDeletedByClusterAutoscaler" | Node taint used for skipping a subset of parked nodes that are already handled by cluster-autoscaler |
54
54
| ArgoRolloutsAPIVersion | "v1alpha1" | API version from `argoproj.io` API group to be used while handling Argo Rollouts objects |
55
55
| EnableKarpenterDriftDetection | false | Controls whether to scan for drifted Karpenter NodeClaims and automatically label their nodes |
56
+
| EnableKarpenterDisruptionDetection | false | Controls whether to scan for disrupted Karpenter NodeClaims and automatically label their nodes |
56
57
| ParkedByLabel | "shredder.ethos.adobe.net/parked-by" | Label used to identify which component parked the node |
57
58
| ParkedNodeTaint | "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule" | Taint to apply to parked nodes in format key=value:effect|
58
59
| EnableNodeLabelDetection | false | Controls whether to scan for nodes with specific labels and automatically park them |
59
60
| NodeLabelsToDetect |[]| List of node labels to detect. Supports both key-only and key=value formats |
60
61
| MaxParkedNodes | 0 | Maximum number of nodes that can be parked simultaneously. Set to 0 (default) for no limit. |
61
62
| ExtraParkingLabels | {} | (Optional) Map of extra labels to apply to nodes and pods during parking. Example: `{ "example.com/owner": "infrastructure" }`|
62
63
| EvictionSafetyCheck | true | Controls whether to perform safety checks before force eviction. If true, nodes will be unparked if pods don't have required parking labels. |
64
+
| ParkingReasonLabel | "shredder.ethos.adobe.net/parked-reason" | Label used to track why a node or pod was parked (values: node-label, karpenter-drifted, karpenter-disrupted) |
63
65
64
66
### How it works
65
67
@@ -89,6 +91,24 @@ k8s-shredder includes an optional feature for automatic detection of drifted Kar
89
91
90
92
This integration allows k8s-shredder to automatically handle node lifecycle management for clusters using Karpenter, ensuring that drifted nodes are properly marked for eviction and eventual replacement.
91
93
94
+
#### Karpenter Disruption Detection
95
+
96
+
k8s-shredder includes an optional feature for automatic detection of disrupted Karpenter NodeClaims. This feature is disabled by default, but can be enabled by setting `EnableKarpenterDisruptionDetection` to `true`. When enabled, at the beginning of each eviction loop, the controller will:
97
+
98
+
1. Scan the Kubernetes cluster for Karpenter NodeClaims that are marked as disrupted (e.g., "Disrupting", "Terminating", "Empty", "Underutilized")
99
+
2. Identify the nodes associated with these disrupted NodeClaims
100
+
3. Automatically process these nodes by:
101
+
102
+
-**Labeling** nodes and their non-DaemonSet pods with:
103
+
-`UpgradeStatusLabel` (set to "parked")
104
+
-`ExpiresOnLabel` (set to current time + `ParkedNodeTTL`)
105
+
-`ParkedByLabel` (set to "k8s-shredder")
106
+
- Any labels specified in `ExtraParkingLabels`
107
+
-**Cordoning** the nodes to prevent new pod scheduling
108
+
-**Tainting** the nodes with the configured `ParkedNodeTaint`
109
+
110
+
This integration ensures that nodes undergoing disruption as part of bin-packing operations have all pods evicted in a reasonable amount of time, preventing them from getting stuck due to blocking Pod Disruption Budgets (PDBs). It complements the drift detection feature by handling nodes that are actively being disrupted by Karpenter's consolidation and optimization processes.
111
+
92
112
#### Labeled Node Detection
93
113
94
114
k8s-shredder includes optional automatic detection of nodes with specific labels. This feature is disabled by default but can be enabled by setting `EnableNodeLabelDetection` to `true`. When enabled, at the beginning of each eviction loop, the application will:
When safety checks fail, k8s-shredder logs detailed information about which pods are missing required labels, helping operators understand why the node was unparked instead of force evicted.
195
215
216
+
#### Parking Reason Tracking
217
+
218
+
k8s-shredder automatically tracks why nodes and pods were parked by applying a configurable parking reason label. This feature helps operators understand the source of parking actions and enables better monitoring and debugging.
219
+
220
+
**Configuration:**
221
+
```yaml
222
+
ParkingReasonLabel: "shredder.ethos.adobe.net/parked-reason" # Default label name
223
+
```
224
+
225
+
**Parking Reason Values:**
226
+
- `node-label`: Node was parked due to node label detection
227
+
- `karpenter-drifted`: Node was parked due to Karpenter drift detection
228
+
- `karpenter-disrupted`: Node was parked due to Karpenter disruption detection
229
+
230
+
**Behavior:**
231
+
- The parking reason label is applied to both nodes and their non-DaemonSet pods during parking
232
+
- The label is automatically removed during the unparking process (e.g., when safety checks fail)
233
+
- The label value corresponds to the detection method that triggered the parking action
234
+
- This label works alongside other parking labels and doesn't interfere with existing functionality
235
+
236
+
**Use cases:**
237
+
- **Monitoring**: Track which detection method is most active in your cluster
238
+
- **Debugging**: Understand why specific nodes were parked
239
+
- **Automation**: Trigger different workflows based on parking reason
240
+
- **Compliance**: Audit parking actions and their sources
241
+
196
242
## Metrics
197
243
198
244
k8s-shredder exposes comprehensive metrics for monitoring its operation. You can find detailed information about all available metrics in the [metrics documentation](docs/metrics.md).
| shredder.EnableNodeLabelDetection | bool |`false`| Enable detection of nodes based on specific labels |
72
73
| shredder.EvictionLoopInterval | string |`"1h"`| How often to run the main eviction loop |
@@ -80,6 +81,7 @@ a novel way of dealing with kubernetes nodes blocked from draining
80
81
| shredder.ParkedByValue | string |`"k8s-shredder"`| Value set in the ParkedByLabel to identify k8s-shredder as the parking agent |
81
82
| shredder.ParkedNodeTTL | string |`"168h"`| How long parked nodes should remain before being eligible for deletion (7 days default) |
82
83
| shredder.ParkedNodeTaint | string |`"shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule"`| Taint applied to parked nodes to prevent new pod scheduling |
84
+
| shredder.ParkingReasonLabel | string |`"shredder.ethos.adobe.net/parked-reason"`| Label used to track why a node or pod was parked |
83
85
| shredder.RestartedAtAnnotation | string |`"shredder.ethos.adobe.net/restartedAt"`| Annotation to track when a workload was last restarted |
84
86
| shredder.RollingRestartThreshold | float |`0.1`| Maximum percentage of nodes that can be restarted simultaneously during rolling restarts |
85
87
| shredder.ToBeDeletedTaint | string |`"ToBeDeletedByClusterAutoscaler"`| Taint indicating nodes scheduled for deletion by cluster autoscaler |
Copy file name to clipboardExpand all lines: config.yaml
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,7 @@ ParkedNodeTaint: shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule # Ta
22
22
ArgoRolloutsAPIVersion: v1alpha1 # API version from argoproj.io API group to be used while handling Argo Rollouts objects
23
23
# Karpenter integration
24
24
EnableKarpenterDriftDetection: false # Controls whether to scan for drifted Karpenter NodeClaims and automatically label their nodes
25
+
EnableKarpenterDisruptionDetection: false # Controls whether to scan for disrupted Karpenter NodeClaims and automatically label their nodes
25
26
# Node label detection
26
27
EnableNodeLabelDetection: false # Controls whether to scan for nodes with specific labels and automatically park them
27
28
NodeLabelsToDetect: [] # List of node labels to detect. Supports both key-only and key=value formats
@@ -40,3 +41,6 @@ MaxParkedNodes: 0 # Maximum number of nodes that can be parked simultaneously.
40
41
41
42
# Safety settings
42
43
EvictionSafetyCheck: true # Controls whether to perform safety checks before force eviction. If true, nodes will be unparked if pods don't have required parking labels.
44
+
45
+
# Parking reason tracking
46
+
ParkingReasonLabel: shredder.ethos.adobe.net/parked-reason # Label used to track why a node or pod was parked
0 commit comments