Skip to content

Commit 1195cd1

Browse files
committed
Allow MaxParkedNodes to work with percentages
1 parent 95137bf commit 1195cd1

File tree

14 files changed

+270
-55
lines changed

14 files changed

+270
-55
lines changed

README.md

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ The following options can be used to customize the k8s-shredder controller:
5858
| ParkedNodeTaint | "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule" | Taint to apply to parked nodes in format key=value:effect |
5959
| EnableNodeLabelDetection | false | Controls whether to scan for nodes with specific labels and automatically park them |
6060
| NodeLabelsToDetect | [] | List of node labels to detect. Supports both key-only and key=value formats |
61-
| MaxParkedNodes | 0 | Maximum number of nodes that can be parked simultaneously. Set to 0 (default) for no limit. |
61+
| MaxParkedNodes | "0" | Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" (default) for no limit. |
6262
| ExtraParkingLabels | {} | (Optional) Map of extra labels to apply to nodes and pods during parking. Example: `{ "example.com/owner": "infrastructure" }` |
6363
| EvictionSafetyCheck | true | Controls whether to perform safety checks before force eviction. If true, nodes will be unparked if pods don't have required parking labels. |
6464
| ParkingReasonLabel | "shredder.ethos.adobe.net/parked-reason" | Label used to track why a node or pod was parked (values: node-label, karpenter-drifted, karpenter-disrupted) |
@@ -131,17 +131,30 @@ This integration allows k8s-shredder to automatically handle node lifecycle mana
131131

132132
k8s-shredder supports limiting the maximum number of nodes that can be parked simultaneously using the `MaxParkedNodes` configuration option. This feature helps prevent overwhelming the cluster with too many parked nodes at once, which could impact application availability.
133133

134-
When `MaxParkedNodes` is set to a positive integer:
134+
`MaxParkedNodes` can be specified as either:
135+
- **Integer value** (e.g., `"5"`): Absolute maximum number of nodes that can be parked
136+
- **Percentage value** (e.g., `"20%"`): Maximum percentage of total cluster nodes that can be parked (calculated dynamically each cycle)
135137

136-
1. **Before parking nodes**: k8s-shredder counts the number of currently parked nodes
137-
2. **Calculate available slots**: `availableSlots = MaxParkedNodes - currentlyParked`
138-
3. **Limit parking**: If the number of eligible nodes exceeds available slots, only the first `availableSlots` nodes are parked
139-
4. **Skip if full**: If no slots are available (currentlyParked >= MaxParkedNodes), parking is skipped for that eviction interval
138+
When `MaxParkedNodes` is set to a non-zero value:
139+
140+
1. **Parse the limit**: The configuration is parsed to determine the actual limit
141+
- For percentages: `limit = (percentage / 100) * totalNodes` (rounded down)
142+
- For integers: `limit = configured value`
143+
2. **Count parked nodes**: k8s-shredder counts the number of currently parked nodes
144+
3. **Calculate available slots**: `availableSlots = limit - currentlyParked`
145+
4. **Limit parking**: If the number of eligible nodes exceeds available slots, only the first `availableSlots` nodes are parked
146+
5. **Skip if full**: If no slots are available (currentlyParked >= limit), parking is skipped for that eviction interval
140147

141148
**Examples:**
142-
- `MaxParkedNodes: 0` (default): No limit, all eligible nodes are parked
143-
- `MaxParkedNodes: 5`: Maximum 5 nodes can be parked at any time
144-
- `MaxParkedNodes: -1`: Invalid value, treated as 0 (no limit) with a warning logged
149+
- `MaxParkedNodes: "0"` (default): No limit, all eligible nodes are parked
150+
- `MaxParkedNodes: "5"`: Maximum 5 nodes can be parked at any time
151+
- `MaxParkedNodes: "20%"`: Maximum 20% of total cluster nodes can be parked (e.g., 2 nodes in a 10-node cluster)
152+
- Invalid values (e.g., `"-1"`, `"invalid"`): Treated as 0 (no limit) with a warning logged
153+
154+
**Percentage Benefits:**
155+
- **Dynamic scaling**: Limit automatically adjusts as cluster size changes
156+
- **Proportional safety**: Maintains a consistent percentage of available capacity regardless of cluster size
157+
- **Auto-scaling friendly**: Works well with cluster auto-scaling by recalculating limits each cycle
145158

146159
This limit applies to both Karpenter drift detection and node label detection features. When multiple nodes are eligible for parking but the limit would be exceeded, k8s-shredder will park the nodes in the order they are discovered and skip the remaining nodes until the next eviction interval.
147160

@@ -150,6 +163,7 @@ This limit applies to both Karpenter drift detection and node label detection fe
150163
- **Resource management**: Prevent excessive resource pressure from too many parked nodes
151164
- **Application stability**: Ensure applications have sufficient capacity during node transitions
152165
- **Cost optimization**: Balance between node replacement speed and cluster stability
166+
- **Auto-scaling clusters**: Use percentage-based limits to maintain consistent safety margins as cluster size changes
153167

154168
#### ExtraParkingLabels
155169

charts/k8s-shredder/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,5 @@ maintainers:
1212
- name: sfotony
1313
1414
url: https://adobe.com
15-
version: 0.2.6
16-
appVersion: v0.3.6
15+
version: 0.2.7
16+
appVersion: v0.3.7

charts/k8s-shredder/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# k8s-shredder
22

3-
![Version: 0.2.6](https://img.shields.io/badge/Version-0.2.6-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.3.6](https://img.shields.io/badge/AppVersion-v0.3.6-informational?style=flat-square)
3+
![Version: 0.2.7](https://img.shields.io/badge/Version-0.2.7-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.3.7](https://img.shields.io/badge/AppVersion-v0.3.7-informational?style=flat-square)
44

55
a novel way of dealing with kubernetes nodes blocked from draining
66

@@ -64,7 +64,7 @@ a novel way of dealing with kubernetes nodes blocked from draining
6464
| serviceAccount.annotations | object | `{}` | Additional annotations for the service account (useful for IAM roles, etc.) |
6565
| serviceAccount.create | bool | `true` | Create a service account for k8s-shredder |
6666
| serviceAccount.name | string | `"k8s-shredder"` | Name of the service account |
67-
| shredder | object | `{"AllowEvictionLabel":"shredder.ethos.adobe.net/allow-eviction","ArgoRolloutsAPIVersion":"v1alpha1","EnableKarpenterDisruptionDetection":false,"EnableKarpenterDriftDetection":false,"EnableNodeLabelDetection":false,"EvictionLoopInterval":"1h","EvictionSafetyCheck":true,"ExpiresOnLabel":"shredder.ethos.adobe.net/parked-node-expires-on","ExtraParkingLabels":{},"MaxParkedNodes":0,"NamespacePrefixSkipInitialEviction":"ns-ethos-","NodeLabelsToDetect":[],"ParkedByLabel":"shredder.ethos.adobe.net/parked-by","ParkedByValue":"k8s-shredder","ParkedNodeTTL":"168h","ParkedNodeTaint":"shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule","ParkingReasonLabel":"shredder.ethos.adobe.net/parked-reason","RestartedAtAnnotation":"shredder.ethos.adobe.net/restartedAt","RollingRestartThreshold":0.1,"ToBeDeletedTaint":"ToBeDeletedByClusterAutoscaler","UpgradeStatusLabel":"shredder.ethos.adobe.net/upgrade-status"}` | Core k8s-shredder configuration |
67+
| shredder | object | `{"AllowEvictionLabel":"shredder.ethos.adobe.net/allow-eviction","ArgoRolloutsAPIVersion":"v1alpha1","EnableKarpenterDisruptionDetection":false,"EnableKarpenterDriftDetection":false,"EnableNodeLabelDetection":false,"EvictionLoopInterval":"1h","EvictionSafetyCheck":true,"ExpiresOnLabel":"shredder.ethos.adobe.net/parked-node-expires-on","ExtraParkingLabels":{},"MaxParkedNodes":"0","NamespacePrefixSkipInitialEviction":"ns-ethos-","NodeLabelsToDetect":[],"ParkedByLabel":"shredder.ethos.adobe.net/parked-by","ParkedByValue":"k8s-shredder","ParkedNodeTTL":"168h","ParkedNodeTaint":"shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule","ParkingReasonLabel":"shredder.ethos.adobe.net/parked-reason","RestartedAtAnnotation":"shredder.ethos.adobe.net/restartedAt","RollingRestartThreshold":0.1,"ToBeDeletedTaint":"ToBeDeletedByClusterAutoscaler","UpgradeStatusLabel":"shredder.ethos.adobe.net/upgrade-status"}` | Core k8s-shredder configuration |
6868
| shredder.AllowEvictionLabel | string | `"shredder.ethos.adobe.net/allow-eviction"` | Label to explicitly allow eviction on specific resources |
6969
| shredder.ArgoRolloutsAPIVersion | string | `"v1alpha1"` | API version for Argo Rollouts integration |
7070
| shredder.EnableKarpenterDisruptionDetection | bool | `false` | Enable Karpenter disruption detection for node lifecycle management |
@@ -74,7 +74,7 @@ a novel way of dealing with kubernetes nodes blocked from draining
7474
| shredder.EvictionSafetyCheck | bool | `true` | Controls whether to perform safety checks before force eviction |
7575
| shredder.ExpiresOnLabel | string | `"shredder.ethos.adobe.net/parked-node-expires-on"` | Label used to track when a parked node expires |
7676
| shredder.ExtraParkingLabels | object | `{}` | Additional labels to apply to nodes and pods during parking |
77-
| shredder.MaxParkedNodes | int | `0` | Maximum number of nodes that can be parked simultaneously (0 = no limit) |
77+
| shredder.MaxParkedNodes | string | `"0"` | Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" for no limit |
7878
| shredder.NamespacePrefixSkipInitialEviction | string | `"ns-ethos-"` | Namespace prefix to skip during initial eviction (useful for system namespaces) |
7979
| shredder.NodeLabelsToDetect | list | `[]` | List of node labels to monitor for triggering shredder actions |
8080
| shredder.ParkedByLabel | string | `"shredder.ethos.adobe.net/parked-by"` | Label to track which component parked a node |

charts/k8s-shredder/values.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ shredder:
6262
EnableNodeLabelDetection: false
6363
# -- List of node labels to monitor for triggering shredder actions
6464
NodeLabelsToDetect: []
65-
# -- Maximum number of nodes that can be parked simultaneously (0 = no limit)
66-
MaxParkedNodes: 0
65+
# -- Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" for no limit
66+
MaxParkedNodes: "0"
6767
# -- Controls whether to perform safety checks before force eviction
6868
EvictionSafetyCheck: true
6969
# -- Label used to track why a node or pod was parked

cmd/root.go

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ governing permissions and limitations under the License.
1212
package cmd
1313

1414
import (
15+
"strconv"
1516
"strings"
1617
"time"
1718

@@ -123,7 +124,7 @@ func discoverConfig() {
123124
viper.SetDefault("ParkedNodeTaint", "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule")
124125
viper.SetDefault("EnableNodeLabelDetection", false)
125126
viper.SetDefault("NodeLabelsToDetect", []string{})
126-
viper.SetDefault("MaxParkedNodes", 0)
127+
viper.SetDefault("MaxParkedNodes", "0")
127128
viper.SetDefault("ExtraParkingLabels", map[string]string{})
128129
viper.SetDefault("EvictionSafetyCheck", true)
129130
viper.SetDefault("ParkingReasonLabel", "shredder.ethos.adobe.net/parked-reason")
@@ -149,9 +150,22 @@ func parseConfig() {
149150
}
150151

151152
// Validate MaxParkedNodes configuration
152-
if cfg.MaxParkedNodes < 0 {
153-
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("MaxParkedNodes is negative, treating as 0 (no limit)")
154-
cfg.MaxParkedNodes = 0
153+
// Basic validation - detailed parsing happens in LimitNodesToPark
154+
if cfg.MaxParkedNodes != "" && cfg.MaxParkedNodes != "0" {
155+
// Check if it's a percentage
156+
if strings.HasSuffix(cfg.MaxParkedNodes, "%") {
157+
percentageStr := strings.TrimSuffix(cfg.MaxParkedNodes, "%")
158+
if _, err := strconv.ParseFloat(percentageStr, 64); err != nil {
159+
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("Invalid MaxParkedNodes percentage format, treating as 0 (no limit)")
160+
cfg.MaxParkedNodes = "0"
161+
}
162+
} else {
163+
// Check if it's a valid integer
164+
if _, err := strconv.Atoi(cfg.MaxParkedNodes); err != nil {
165+
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("Invalid MaxParkedNodes format, treating as 0 (no limit)")
166+
cfg.MaxParkedNodes = "0"
167+
}
168+
}
155169
}
156170

157171
log.WithFields(log.Fields{

config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ NodeLabelsToDetect: [] # List of node labels to detect. Supports both key-only
3232
# - "node.example.com/park" # Matches any node with the "node.example.com/park" label
3333

3434
# Parking limits
35-
MaxParkedNodes: 0 # Maximum number of nodes that can be parked simultaneously. Set to 0 (default) for no limit.
35+
MaxParkedNodes: "0" # Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" (default) for no limit.
3636

3737
# Extra labels to apply to parked nodes and pods
3838
# ExtraParkingLabels: # (optional) Additional labels to apply to nodes and pods during parking

internal/testing/k8s-shredder-karpenter.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ data:
8080
ParkedNodeTaint: "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule"
8181
EnableNodeLabelDetection: false
8282
NodeLabelsToDetect: []
83-
MaxParkedNodes: 0
83+
MaxParkedNodes: "0"
8484
---
8585
apiVersion: v1
8686
kind: Service

internal/testing/k8s-shredder-node-labels.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ data:
8383
- "test-label"
8484
- "maintenance=scheduled"
8585
- "node.test.io/park"
86-
MaxParkedNodes: 0
86+
MaxParkedNodes: "0"
8787
---
8888
apiVersion: v1
8989
kind: Service

internal/testing/k8s-shredder.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ data:
8080
ParkedNodeTaint: "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule"
8181
EnableNodeLabelDetection: false
8282
NodeLabelsToDetect: []
83-
MaxParkedNodes: 0
83+
MaxParkedNodes: "0"
8484
---
8585
apiVersion: v1
8686
kind: Service

pkg/config/config.go

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,11 @@ type Config struct {
4949
EnableNodeLabelDetection bool
5050
// NodeLabelsToDetect is a list of node labels to look for. Can be just keys or key=value pairs
5151
NodeLabelsToDetect []string
52-
// MaxParkedNodes is the maximum number of nodes that can be parked simultaneously. If set to 0 (default), no limit is applied.
53-
MaxParkedNodes int
52+
// MaxParkedNodes is the maximum number of nodes that can be parked simultaneously.
53+
// Can be either an integer (e.g. "5") or a percentage (e.g. "20%").
54+
// If set to "0" or empty (default), no limit is applied.
55+
// When a percentage is specified, the limit is calculated as (percentage/100) * (total nodes in cluster).
56+
MaxParkedNodes string
5457
// ExtraParkingLabels is a map of additional labels to apply to nodes and pods during the parking process. If not set, no extra labels are applied.
5558
ExtraParkingLabels map[string]string
5659
// EvictionSafetyCheck controls whether to perform safety checks before force eviction. If true, nodes will be unparked if pods don't have required parking labels.

0 commit comments

Comments
 (0)