Skip to content

Commit 54b2501

Browse files
authored
Max parked nodes by using percentages (#403)
* Allow MaxParkedNodes to work with percentages * Park nodes by ago, oldest first
1 parent 5008a7b commit 54b2501

File tree

15 files changed

+499
-87
lines changed

15 files changed

+499
-87
lines changed

.github/workflows/ci-chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
- name: Set up Python
1919
uses: actions/setup-python@v6
2020
with:
21-
python-version: '3.14'
21+
python-version: '3.12'
2222
check-latest: true
2323
- name: Set up chart-testing
2424
uses: helm/[email protected]

README.md

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ The following options can be used to customize the k8s-shredder controller:
5858
| ParkedNodeTaint | "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule" | Taint to apply to parked nodes in format key=value:effect |
5959
| EnableNodeLabelDetection | false | Controls whether to scan for nodes with specific labels and automatically park them |
6060
| NodeLabelsToDetect | [] | List of node labels to detect. Supports both key-only and key=value formats |
61-
| MaxParkedNodes | 0 | Maximum number of nodes that can be parked simultaneously. Set to 0 (default) for no limit. |
61+
| MaxParkedNodes | "0" | Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" (default) for no limit. |
6262
| ExtraParkingLabels | {} | (Optional) Map of extra labels to apply to nodes and pods during parking. Example: `{ "example.com/owner": "infrastructure" }` |
6363
| EvictionSafetyCheck | true | Controls whether to perform safety checks before force eviction. If true, nodes will be unparked if pods don't have required parking labels. |
6464
| ParkingReasonLabel | "shredder.ethos.adobe.net/parked-reason" | Label used to track why a node or pod was parked (values: node-label, karpenter-drifted, karpenter-disrupted) |
@@ -131,25 +131,49 @@ This integration allows k8s-shredder to automatically handle node lifecycle mana
131131

132132
k8s-shredder supports limiting the maximum number of nodes that can be parked simultaneously using the `MaxParkedNodes` configuration option. This feature helps prevent overwhelming the cluster with too many parked nodes at once, which could impact application availability.
133133

134-
When `MaxParkedNodes` is set to a positive integer:
134+
`MaxParkedNodes` can be specified as either:
135+
- **Integer value** (e.g., `"5"`): Absolute maximum number of nodes that can be parked
136+
- **Percentage value** (e.g., `"20%"`): Maximum percentage of total cluster nodes that can be parked (calculated dynamically each cycle)
135137

136-
1. **Before parking nodes**: k8s-shredder counts the number of currently parked nodes
137-
2. **Calculate available slots**: `availableSlots = MaxParkedNodes - currentlyParked`
138-
3. **Limit parking**: If the number of eligible nodes exceeds available slots, only the first `availableSlots` nodes are parked
139-
4. **Skip if full**: If no slots are available (currentlyParked >= MaxParkedNodes), parking is skipped for that eviction interval
138+
When `MaxParkedNodes` is set to a non-zero value:
140139

141-
**Examples:**
142-
- `MaxParkedNodes: 0` (default): No limit, all eligible nodes are parked
143-
- `MaxParkedNodes: 5`: Maximum 5 nodes can be parked at any time
144-
- `MaxParkedNodes: -1`: Invalid value, treated as 0 (no limit) with a warning logged
140+
1. **Parse the limit**: The configuration is parsed to determine the actual limit
141+
- For percentages: `limit = (percentage / 100) * totalNodes` (rounded down)
142+
- For integers: `limit = configured value`
143+
2. **Count parked nodes**: k8s-shredder counts the number of currently parked nodes
144+
3. **Calculate available slots**: `availableSlots = limit - currentlyParked`
145+
4. **Sort by age**: Eligible nodes are sorted by creation timestamp (oldest first) to ensure predictable parking order
146+
5. **Limit parking**: If the number of eligible nodes exceeds available slots, only the oldest `availableSlots` nodes are parked
147+
6. **Skip if full**: If no slots are available (currentlyParked >= limit), parking is skipped for that eviction interval
145148

146-
This limit applies to both Karpenter drift detection and node label detection features. When multiple nodes are eligible for parking but the limit would be exceeded, k8s-shredder will park the nodes in the order they are discovered and skip the remaining nodes until the next eviction interval.
149+
**Examples:**
150+
- `MaxParkedNodes: "0"` (default): No limit, all eligible nodes are parked
151+
- `MaxParkedNodes: "5"`: Maximum 5 nodes can be parked at any time
152+
- `MaxParkedNodes: "20%"`: Maximum 20% of total cluster nodes can be parked (e.g., 2 nodes in a 10-node cluster)
153+
- Invalid values (e.g., `"-1"`, `"invalid"`): Treated as 0 (no limit) with a warning logged
154+
155+
**Percentage Benefits:**
156+
- **Dynamic scaling**: Limit automatically adjusts as cluster size changes
157+
- **Proportional safety**: Maintains a consistent percentage of available capacity regardless of cluster size
158+
- **Auto-scaling friendly**: Works well with cluster auto-scaling by recalculating limits each cycle
159+
160+
**Predictable Parking Order:**
161+
Eligible nodes are **always sorted by creation timestamp (oldest first)**, regardless of whether `MaxParkedNodes` is set. This ensures:
162+
- **Consistent behavior**: The same nodes will be parked first across multiple eviction cycles
163+
- **Fair rotation**: Oldest nodes are prioritized for replacement during rolling upgrades
164+
- **Predictable capacity**: You can anticipate which nodes will be parked next when slots become available
165+
- **Deterministic ordering**: Even when parking all eligible nodes (no limit), they are processed in a predictable order
166+
167+
This sorting behavior applies to both Karpenter drift detection and node label detection features. When multiple nodes are eligible for parking:
168+
- **With no limit** (`MaxParkedNodes: "0"`): All nodes are parked in order from oldest to newest
169+
- **With a limit**: Only the oldest nodes up to the limit are parked; newer nodes wait for the next eviction interval
147170

148171
**Use cases:**
149172
- **Gradual node replacement**: Control the pace of node cycling during cluster upgrades
150173
- **Resource management**: Prevent excessive resource pressure from too many parked nodes
151174
- **Application stability**: Ensure applications have sufficient capacity during node transitions
152175
- **Cost optimization**: Balance between node replacement speed and cluster stability
176+
- **Auto-scaling clusters**: Use percentage-based limits to maintain consistent safety margins as cluster size changes
153177

154178
#### ExtraParkingLabels
155179

charts/k8s-shredder/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,5 @@ maintainers:
1212
- name: sfotony
1313
1414
url: https://adobe.com
15-
version: 0.2.6
16-
appVersion: v0.3.6
15+
version: 0.2.7
16+
appVersion: v0.3.7

charts/k8s-shredder/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# k8s-shredder
22

3-
![Version: 0.2.6](https://img.shields.io/badge/Version-0.2.6-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.3.6](https://img.shields.io/badge/AppVersion-v0.3.6-informational?style=flat-square)
3+
![Version: 0.2.7](https://img.shields.io/badge/Version-0.2.7-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.3.7](https://img.shields.io/badge/AppVersion-v0.3.7-informational?style=flat-square)
44

55
a novel way of dealing with kubernetes nodes blocked from draining
66

@@ -64,7 +64,7 @@ a novel way of dealing with kubernetes nodes blocked from draining
6464
| serviceAccount.annotations | object | `{}` | Additional annotations for the service account (useful for IAM roles, etc.) |
6565
| serviceAccount.create | bool | `true` | Create a service account for k8s-shredder |
6666
| serviceAccount.name | string | `"k8s-shredder"` | Name of the service account |
67-
| shredder | object | `{"AllowEvictionLabel":"shredder.ethos.adobe.net/allow-eviction","ArgoRolloutsAPIVersion":"v1alpha1","EnableKarpenterDisruptionDetection":false,"EnableKarpenterDriftDetection":false,"EnableNodeLabelDetection":false,"EvictionLoopInterval":"1h","EvictionSafetyCheck":true,"ExpiresOnLabel":"shredder.ethos.adobe.net/parked-node-expires-on","ExtraParkingLabels":{},"MaxParkedNodes":0,"NamespacePrefixSkipInitialEviction":"ns-ethos-","NodeLabelsToDetect":[],"ParkedByLabel":"shredder.ethos.adobe.net/parked-by","ParkedByValue":"k8s-shredder","ParkedNodeTTL":"168h","ParkedNodeTaint":"shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule","ParkingReasonLabel":"shredder.ethos.adobe.net/parked-reason","RestartedAtAnnotation":"shredder.ethos.adobe.net/restartedAt","RollingRestartThreshold":0.1,"ToBeDeletedTaint":"ToBeDeletedByClusterAutoscaler","UpgradeStatusLabel":"shredder.ethos.adobe.net/upgrade-status"}` | Core k8s-shredder configuration |
67+
| shredder | object | `{"AllowEvictionLabel":"shredder.ethos.adobe.net/allow-eviction","ArgoRolloutsAPIVersion":"v1alpha1","EnableKarpenterDisruptionDetection":false,"EnableKarpenterDriftDetection":false,"EnableNodeLabelDetection":false,"EvictionLoopInterval":"1h","EvictionSafetyCheck":true,"ExpiresOnLabel":"shredder.ethos.adobe.net/parked-node-expires-on","ExtraParkingLabels":{},"MaxParkedNodes":"0","NamespacePrefixSkipInitialEviction":"ns-ethos-","NodeLabelsToDetect":[],"ParkedByLabel":"shredder.ethos.adobe.net/parked-by","ParkedByValue":"k8s-shredder","ParkedNodeTTL":"168h","ParkedNodeTaint":"shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule","ParkingReasonLabel":"shredder.ethos.adobe.net/parked-reason","RestartedAtAnnotation":"shredder.ethos.adobe.net/restartedAt","RollingRestartThreshold":0.1,"ToBeDeletedTaint":"ToBeDeletedByClusterAutoscaler","UpgradeStatusLabel":"shredder.ethos.adobe.net/upgrade-status"}` | Core k8s-shredder configuration |
6868
| shredder.AllowEvictionLabel | string | `"shredder.ethos.adobe.net/allow-eviction"` | Label to explicitly allow eviction on specific resources |
6969
| shredder.ArgoRolloutsAPIVersion | string | `"v1alpha1"` | API version for Argo Rollouts integration |
7070
| shredder.EnableKarpenterDisruptionDetection | bool | `false` | Enable Karpenter disruption detection for node lifecycle management |
@@ -74,7 +74,7 @@ a novel way of dealing with kubernetes nodes blocked from draining
7474
| shredder.EvictionSafetyCheck | bool | `true` | Controls whether to perform safety checks before force eviction |
7575
| shredder.ExpiresOnLabel | string | `"shredder.ethos.adobe.net/parked-node-expires-on"` | Label used to track when a parked node expires |
7676
| shredder.ExtraParkingLabels | object | `{}` | Additional labels to apply to nodes and pods during parking |
77-
| shredder.MaxParkedNodes | int | `0` | Maximum number of nodes that can be parked simultaneously (0 = no limit) |
77+
| shredder.MaxParkedNodes | string | `"0"` | Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" for no limit |
7878
| shredder.NamespacePrefixSkipInitialEviction | string | `"ns-ethos-"` | Namespace prefix to skip during initial eviction (useful for system namespaces) |
7979
| shredder.NodeLabelsToDetect | list | `[]` | List of node labels to monitor for triggering shredder actions |
8080
| shredder.ParkedByLabel | string | `"shredder.ethos.adobe.net/parked-by"` | Label to track which component parked a node |

charts/k8s-shredder/values.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ shredder:
6262
EnableNodeLabelDetection: false
6363
# -- List of node labels to monitor for triggering shredder actions
6464
NodeLabelsToDetect: []
65-
# -- Maximum number of nodes that can be parked simultaneously (0 = no limit)
66-
MaxParkedNodes: 0
65+
# -- Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" for no limit
66+
MaxParkedNodes: '0'
6767
# -- Controls whether to perform safety checks before force eviction
6868
EvictionSafetyCheck: true
6969
# -- Label used to track why a node or pod was parked

cmd/root.go

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ governing permissions and limitations under the License.
1212
package cmd
1313

1414
import (
15+
"strconv"
1516
"strings"
1617
"time"
1718

@@ -123,7 +124,7 @@ func discoverConfig() {
123124
viper.SetDefault("ParkedNodeTaint", "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule")
124125
viper.SetDefault("EnableNodeLabelDetection", false)
125126
viper.SetDefault("NodeLabelsToDetect", []string{})
126-
viper.SetDefault("MaxParkedNodes", 0)
127+
viper.SetDefault("MaxParkedNodes", "0")
127128
viper.SetDefault("ExtraParkingLabels", map[string]string{})
128129
viper.SetDefault("EvictionSafetyCheck", true)
129130
viper.SetDefault("ParkingReasonLabel", "shredder.ethos.adobe.net/parked-reason")
@@ -149,9 +150,22 @@ func parseConfig() {
149150
}
150151

151152
// Validate MaxParkedNodes configuration
152-
if cfg.MaxParkedNodes < 0 {
153-
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("MaxParkedNodes is negative, treating as 0 (no limit)")
154-
cfg.MaxParkedNodes = 0
153+
// Basic validation - detailed parsing happens in LimitNodesToPark
154+
if cfg.MaxParkedNodes != "" && cfg.MaxParkedNodes != "0" {
155+
// Check if it's a percentage
156+
if strings.HasSuffix(cfg.MaxParkedNodes, "%") {
157+
percentageStr := strings.TrimSuffix(cfg.MaxParkedNodes, "%")
158+
if _, err := strconv.ParseFloat(percentageStr, 64); err != nil {
159+
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("Invalid MaxParkedNodes percentage format, treating as 0 (no limit)")
160+
cfg.MaxParkedNodes = "0"
161+
}
162+
} else {
163+
// Check if it's a valid integer
164+
if _, err := strconv.Atoi(cfg.MaxParkedNodes); err != nil {
165+
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("Invalid MaxParkedNodes format, treating as 0 (no limit)")
166+
cfg.MaxParkedNodes = "0"
167+
}
168+
}
155169
}
156170

157171
log.WithFields(log.Fields{

config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ NodeLabelsToDetect: [] # List of node labels to detect. Supports both key-only
3232
# - "node.example.com/park" # Matches any node with the "node.example.com/park" label
3333

3434
# Parking limits
35-
MaxParkedNodes: 0 # Maximum number of nodes that can be parked simultaneously. Set to 0 (default) for no limit.
35+
MaxParkedNodes: '0' # Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" (default) for no limit.
3636

3737
# Extra labels to apply to parked nodes and pods
3838
# ExtraParkingLabels: # (optional) Additional labels to apply to nodes and pods during parking

internal/testing/k8s-shredder-karpenter.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ data:
8080
ParkedNodeTaint: "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule"
8181
EnableNodeLabelDetection: false
8282
NodeLabelsToDetect: []
83-
MaxParkedNodes: 0
83+
MaxParkedNodes: "0"
8484
---
8585
apiVersion: v1
8686
kind: Service

internal/testing/k8s-shredder-node-labels.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ data:
8383
- "test-label"
8484
- "maintenance=scheduled"
8585
- "node.test.io/park"
86-
MaxParkedNodes: 0
86+
MaxParkedNodes: "0"
8787
---
8888
apiVersion: v1
8989
kind: Service

internal/testing/k8s-shredder.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ data:
8080
ParkedNodeTaint: "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule"
8181
EnableNodeLabelDetection: false
8282
NodeLabelsToDetect: []
83-
MaxParkedNodes: 0
83+
MaxParkedNodes: "0"
8484
---
8585
apiVersion: v1
8686
kind: Service

0 commit comments

Comments
 (0)