Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci-chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.14'
python-version: '3.12'
check-latest: true
- name: Set up chart-testing
uses: helm/[email protected]
Expand Down
46 changes: 35 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The following options can be used to customize the k8s-shredder controller:
| ParkedNodeTaint | "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule" | Taint to apply to parked nodes in format key=value:effect |
| EnableNodeLabelDetection | false | Controls whether to scan for nodes with specific labels and automatically park them |
| NodeLabelsToDetect | [] | List of node labels to detect. Supports both key-only and key=value formats |
| MaxParkedNodes | 0 | Maximum number of nodes that can be parked simultaneously. Set to 0 (default) for no limit. |
| MaxParkedNodes | "0" | Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" (default) for no limit. |
| ExtraParkingLabels | {} | (Optional) Map of extra labels to apply to nodes and pods during parking. Example: `{ "example.com/owner": "infrastructure" }` |
| EvictionSafetyCheck | true | Controls whether to perform safety checks before force eviction. If true, nodes will be unparked if pods don't have required parking labels. |
| ParkingReasonLabel | "shredder.ethos.adobe.net/parked-reason" | Label used to track why a node or pod was parked (values: node-label, karpenter-drifted, karpenter-disrupted) |
Expand Down Expand Up @@ -131,25 +131,49 @@ This integration allows k8s-shredder to automatically handle node lifecycle mana

k8s-shredder supports limiting the maximum number of nodes that can be parked simultaneously using the `MaxParkedNodes` configuration option. This feature helps prevent overwhelming the cluster with too many parked nodes at once, which could impact application availability.

When `MaxParkedNodes` is set to a positive integer:
`MaxParkedNodes` can be specified as either:
- **Integer value** (e.g., `"5"`): Absolute maximum number of nodes that can be parked
- **Percentage value** (e.g., `"20%"`): Maximum percentage of total cluster nodes that can be parked (calculated dynamically each cycle)

1. **Before parking nodes**: k8s-shredder counts the number of currently parked nodes
2. **Calculate available slots**: `availableSlots = MaxParkedNodes - currentlyParked`
3. **Limit parking**: If the number of eligible nodes exceeds available slots, only the first `availableSlots` nodes are parked
4. **Skip if full**: If no slots are available (currentlyParked >= MaxParkedNodes), parking is skipped for that eviction interval
When `MaxParkedNodes` is set to a non-zero value:

**Examples:**
- `MaxParkedNodes: 0` (default): No limit, all eligible nodes are parked
- `MaxParkedNodes: 5`: Maximum 5 nodes can be parked at any time
- `MaxParkedNodes: -1`: Invalid value, treated as 0 (no limit) with a warning logged
1. **Parse the limit**: The configuration is parsed to determine the actual limit
- For percentages: `limit = (percentage / 100) * totalNodes` (rounded down)
- For integers: `limit = configured value`
2. **Count parked nodes**: k8s-shredder counts the number of currently parked nodes
3. **Calculate available slots**: `availableSlots = limit - currentlyParked`
4. **Sort by age**: Eligible nodes are sorted by creation timestamp (oldest first) to ensure predictable parking order
5. **Limit parking**: If the number of eligible nodes exceeds available slots, only the oldest `availableSlots` nodes are parked
6. **Skip if full**: If no slots are available (currentlyParked >= limit), parking is skipped for that eviction interval

This limit applies to both Karpenter drift detection and node label detection features. When multiple nodes are eligible for parking but the limit would be exceeded, k8s-shredder will park the nodes in the order they are discovered and skip the remaining nodes until the next eviction interval.
**Examples:**
- `MaxParkedNodes: "0"` (default): No limit, all eligible nodes are parked
- `MaxParkedNodes: "5"`: Maximum 5 nodes can be parked at any time
- `MaxParkedNodes: "20%"`: Maximum 20% of total cluster nodes can be parked (e.g., 2 nodes in a 10-node cluster)
- Invalid values (e.g., `"-1"`, `"invalid"`): Treated as 0 (no limit) with a warning logged

**Percentage Benefits:**
- **Dynamic scaling**: Limit automatically adjusts as cluster size changes
- **Proportional safety**: Maintains a consistent percentage of available capacity regardless of cluster size
- **Auto-scaling friendly**: Works well with cluster auto-scaling by recalculating limits each cycle

**Predictable Parking Order:**
Eligible nodes are **always sorted by creation timestamp (oldest first)**, regardless of whether `MaxParkedNodes` is set. This ensures:
- **Consistent behavior**: The same nodes will be parked first across multiple eviction cycles
- **Fair rotation**: Oldest nodes are prioritized for replacement during rolling upgrades
- **Predictable capacity**: You can anticipate which nodes will be parked next when slots become available
- **Deterministic ordering**: Even when parking all eligible nodes (no limit), they are processed in a predictable order

This sorting behavior applies to both Karpenter drift detection and node label detection features. When multiple nodes are eligible for parking:
- **With no limit** (`MaxParkedNodes: "0"`): All nodes are parked in order from oldest to newest
- **With a limit**: Only the oldest nodes up to the limit are parked; newer nodes wait for the next eviction interval

**Use cases:**
- **Gradual node replacement**: Control the pace of node cycling during cluster upgrades
- **Resource management**: Prevent excessive resource pressure from too many parked nodes
- **Application stability**: Ensure applications have sufficient capacity during node transitions
- **Cost optimization**: Balance between node replacement speed and cluster stability
- **Auto-scaling clusters**: Use percentage-based limits to maintain consistent safety margins as cluster size changes

#### ExtraParkingLabels

Expand Down
4 changes: 2 additions & 2 deletions charts/k8s-shredder/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ maintainers:
- name: sfotony
email: [email protected]
url: https://adobe.com
version: 0.2.6
appVersion: v0.3.6
version: 0.2.7
appVersion: v0.3.7
6 changes: 3 additions & 3 deletions charts/k8s-shredder/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# k8s-shredder

![Version: 0.2.6](https://img.shields.io/badge/Version-0.2.6-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.3.6](https://img.shields.io/badge/AppVersion-v0.3.6-informational?style=flat-square)
![Version: 0.2.7](https://img.shields.io/badge/Version-0.2.7-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.3.7](https://img.shields.io/badge/AppVersion-v0.3.7-informational?style=flat-square)

a novel way of dealing with kubernetes nodes blocked from draining

Expand Down Expand Up @@ -64,7 +64,7 @@ a novel way of dealing with kubernetes nodes blocked from draining
| serviceAccount.annotations | object | `{}` | Additional annotations for the service account (useful for IAM roles, etc.) |
| serviceAccount.create | bool | `true` | Create a service account for k8s-shredder |
| serviceAccount.name | string | `"k8s-shredder"` | Name of the service account |
| shredder | object | `{"AllowEvictionLabel":"shredder.ethos.adobe.net/allow-eviction","ArgoRolloutsAPIVersion":"v1alpha1","EnableKarpenterDisruptionDetection":false,"EnableKarpenterDriftDetection":false,"EnableNodeLabelDetection":false,"EvictionLoopInterval":"1h","EvictionSafetyCheck":true,"ExpiresOnLabel":"shredder.ethos.adobe.net/parked-node-expires-on","ExtraParkingLabels":{},"MaxParkedNodes":0,"NamespacePrefixSkipInitialEviction":"ns-ethos-","NodeLabelsToDetect":[],"ParkedByLabel":"shredder.ethos.adobe.net/parked-by","ParkedByValue":"k8s-shredder","ParkedNodeTTL":"168h","ParkedNodeTaint":"shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule","ParkingReasonLabel":"shredder.ethos.adobe.net/parked-reason","RestartedAtAnnotation":"shredder.ethos.adobe.net/restartedAt","RollingRestartThreshold":0.1,"ToBeDeletedTaint":"ToBeDeletedByClusterAutoscaler","UpgradeStatusLabel":"shredder.ethos.adobe.net/upgrade-status"}` | Core k8s-shredder configuration |
| shredder | object | `{"AllowEvictionLabel":"shredder.ethos.adobe.net/allow-eviction","ArgoRolloutsAPIVersion":"v1alpha1","EnableKarpenterDisruptionDetection":false,"EnableKarpenterDriftDetection":false,"EnableNodeLabelDetection":false,"EvictionLoopInterval":"1h","EvictionSafetyCheck":true,"ExpiresOnLabel":"shredder.ethos.adobe.net/parked-node-expires-on","ExtraParkingLabels":{},"MaxParkedNodes":"0","NamespacePrefixSkipInitialEviction":"ns-ethos-","NodeLabelsToDetect":[],"ParkedByLabel":"shredder.ethos.adobe.net/parked-by","ParkedByValue":"k8s-shredder","ParkedNodeTTL":"168h","ParkedNodeTaint":"shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule","ParkingReasonLabel":"shredder.ethos.adobe.net/parked-reason","RestartedAtAnnotation":"shredder.ethos.adobe.net/restartedAt","RollingRestartThreshold":0.1,"ToBeDeletedTaint":"ToBeDeletedByClusterAutoscaler","UpgradeStatusLabel":"shredder.ethos.adobe.net/upgrade-status"}` | Core k8s-shredder configuration |
| shredder.AllowEvictionLabel | string | `"shredder.ethos.adobe.net/allow-eviction"` | Label to explicitly allow eviction on specific resources |
| shredder.ArgoRolloutsAPIVersion | string | `"v1alpha1"` | API version for Argo Rollouts integration |
| shredder.EnableKarpenterDisruptionDetection | bool | `false` | Enable Karpenter disruption detection for node lifecycle management |
Expand All @@ -74,7 +74,7 @@ a novel way of dealing with kubernetes nodes blocked from draining
| shredder.EvictionSafetyCheck | bool | `true` | Controls whether to perform safety checks before force eviction |
| shredder.ExpiresOnLabel | string | `"shredder.ethos.adobe.net/parked-node-expires-on"` | Label used to track when a parked node expires |
| shredder.ExtraParkingLabels | object | `{}` | Additional labels to apply to nodes and pods during parking |
| shredder.MaxParkedNodes | int | `0` | Maximum number of nodes that can be parked simultaneously (0 = no limit) |
| shredder.MaxParkedNodes | string | `"0"` | Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" for no limit |
| shredder.NamespacePrefixSkipInitialEviction | string | `"ns-ethos-"` | Namespace prefix to skip during initial eviction (useful for system namespaces) |
| shredder.NodeLabelsToDetect | list | `[]` | List of node labels to monitor for triggering shredder actions |
| shredder.ParkedByLabel | string | `"shredder.ethos.adobe.net/parked-by"` | Label to track which component parked a node |
Expand Down
4 changes: 2 additions & 2 deletions charts/k8s-shredder/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ shredder:
EnableNodeLabelDetection: false
# -- List of node labels to monitor for triggering shredder actions
NodeLabelsToDetect: []
# -- Maximum number of nodes that can be parked simultaneously (0 = no limit)
MaxParkedNodes: 0
# -- Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" for no limit
MaxParkedNodes: '0'
# -- Controls whether to perform safety checks before force eviction
EvictionSafetyCheck: true
# -- Label used to track why a node or pod was parked
Expand Down
22 changes: 18 additions & 4 deletions cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ governing permissions and limitations under the License.
package cmd

import (
"strconv"
"strings"
"time"

Expand Down Expand Up @@ -123,7 +124,7 @@ func discoverConfig() {
viper.SetDefault("ParkedNodeTaint", "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule")
viper.SetDefault("EnableNodeLabelDetection", false)
viper.SetDefault("NodeLabelsToDetect", []string{})
viper.SetDefault("MaxParkedNodes", 0)
viper.SetDefault("MaxParkedNodes", "0")
viper.SetDefault("ExtraParkingLabels", map[string]string{})
viper.SetDefault("EvictionSafetyCheck", true)
viper.SetDefault("ParkingReasonLabel", "shredder.ethos.adobe.net/parked-reason")
Expand All @@ -149,9 +150,22 @@ func parseConfig() {
}

// Validate MaxParkedNodes configuration
if cfg.MaxParkedNodes < 0 {
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("MaxParkedNodes is negative, treating as 0 (no limit)")
cfg.MaxParkedNodes = 0
// Basic validation - detailed parsing happens in LimitNodesToPark
if cfg.MaxParkedNodes != "" && cfg.MaxParkedNodes != "0" {
// Check if it's a percentage
if strings.HasSuffix(cfg.MaxParkedNodes, "%") {
percentageStr := strings.TrimSuffix(cfg.MaxParkedNodes, "%")
if _, err := strconv.ParseFloat(percentageStr, 64); err != nil {
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("Invalid MaxParkedNodes percentage format, treating as 0 (no limit)")
cfg.MaxParkedNodes = "0"
}
} else {
// Check if it's a valid integer
if _, err := strconv.Atoi(cfg.MaxParkedNodes); err != nil {
log.WithField("MaxParkedNodes", cfg.MaxParkedNodes).Warn("Invalid MaxParkedNodes format, treating as 0 (no limit)")
cfg.MaxParkedNodes = "0"
}
}
}

log.WithFields(log.Fields{
Expand Down
2 changes: 1 addition & 1 deletion config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ NodeLabelsToDetect: [] # List of node labels to detect. Supports both key-only
# - "node.example.com/park" # Matches any node with the "node.example.com/park" label

# Parking limits
MaxParkedNodes: 0 # Maximum number of nodes that can be parked simultaneously. Set to 0 (default) for no limit.
MaxParkedNodes: '0' # Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" (default) for no limit.

# Extra labels to apply to parked nodes and pods
# ExtraParkingLabels: # (optional) Additional labels to apply to nodes and pods during parking
Expand Down
2 changes: 1 addition & 1 deletion internal/testing/k8s-shredder-karpenter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ data:
ParkedNodeTaint: "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule"
EnableNodeLabelDetection: false
NodeLabelsToDetect: []
MaxParkedNodes: 0
MaxParkedNodes: "0"
---
apiVersion: v1
kind: Service
Expand Down
2 changes: 1 addition & 1 deletion internal/testing/k8s-shredder-node-labels.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ data:
- "test-label"
- "maintenance=scheduled"
- "node.test.io/park"
MaxParkedNodes: 0
MaxParkedNodes: "0"
---
apiVersion: v1
kind: Service
Expand Down
2 changes: 1 addition & 1 deletion internal/testing/k8s-shredder.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ data:
ParkedNodeTaint: "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule"
EnableNodeLabelDetection: false
NodeLabelsToDetect: []
MaxParkedNodes: 0
MaxParkedNodes: "0"
---
apiVersion: v1
kind: Service
Expand Down
7 changes: 5 additions & 2 deletions pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,11 @@ type Config struct {
EnableNodeLabelDetection bool
// NodeLabelsToDetect is a list of node labels to look for. Can be just keys or key=value pairs
NodeLabelsToDetect []string
// MaxParkedNodes is the maximum number of nodes that can be parked simultaneously. If set to 0 (default), no limit is applied.
MaxParkedNodes int
// MaxParkedNodes is the maximum number of nodes that can be parked simultaneously.
// Can be either an integer (e.g. "5") or a percentage (e.g. "20%").
// If set to "0" or empty (default), no limit is applied.
// When a percentage is specified, the limit is calculated as (percentage/100) * (total nodes in cluster).
MaxParkedNodes string
// ExtraParkingLabels is a map of additional labels to apply to nodes and pods during the parking process. If not set, no extra labels are applied.
ExtraParkingLabels map[string]string
// EvictionSafetyCheck controls whether to perform safety checks before force eviction. If true, nodes will be unparked if pods don't have required parking labels.
Expand Down
18 changes: 9 additions & 9 deletions pkg/utils/karpenter_detection_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ func TestLabelDriftedNodes(t *testing.T) {
name: "No drifted node claims",
driftedNodeClaims: []KarpenterNodeClaimInfo{},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
ParkingReasonLabel: "parked-reason",
},
Expand All @@ -225,7 +225,7 @@ func TestLabelDriftedNodes(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
ParkingReasonLabel: "parked-reason",
},
Expand All @@ -245,7 +245,7 @@ func TestLabelDriftedNodes(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
ParkingReasonLabel: "parked-reason",
},
Expand All @@ -265,7 +265,7 @@ func TestLabelDriftedNodes(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
ParkingReasonLabel: "parked-reason",
},
Expand Down Expand Up @@ -484,7 +484,7 @@ func TestLabelDisruptedNodes(t *testing.T) {
name: "No disrupted node claims",
disruptedNodeClaims: []KarpenterNodeClaimInfo{},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
ParkingReasonLabel: "parked-reason",
},
Expand All @@ -505,7 +505,7 @@ func TestLabelDisruptedNodes(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
ParkingReasonLabel: "parked-reason",
},
Expand All @@ -526,7 +526,7 @@ func TestLabelDisruptedNodes(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
ParkingReasonLabel: "parked-reason",
},
Expand All @@ -547,7 +547,7 @@ func TestLabelDisruptedNodes(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
ParkingReasonLabel: "parked-reason",
},
Expand Down Expand Up @@ -614,7 +614,7 @@ func TestProcessDriftedKarpenterNodes(t *testing.T) {
appContext: &AppContext{
Config: config.Config{
UpgradeStatusLabel: "upgrade-status",
MaxParkedNodes: 5,
MaxParkedNodes: "5",
ParkingReasonLabel: "parked-reason",
},
K8sClient: fake.NewSimpleClientset(),
Expand Down
10 changes: 5 additions & 5 deletions pkg/utils/node_label_detection_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,7 @@ func TestParkNodesWithLabels(t *testing.T) {
name: "No matching nodes",
matchingNodes: []NodeLabelInfo{},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
},
dryRun: false,
Expand All @@ -448,7 +448,7 @@ func TestParkNodesWithLabels(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
},
dryRun: false,
Expand All @@ -472,7 +472,7 @@ func TestParkNodesWithLabels(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
},
dryRun: false,
Expand All @@ -490,7 +490,7 @@ func TestParkNodesWithLabels(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 5,
MaxParkedNodes: "5",
UpgradeStatusLabel: "upgrade-status",
},
dryRun: true,
Expand Down Expand Up @@ -520,7 +520,7 @@ func TestParkNodesWithLabels(t *testing.T) {
},
},
cfg: config.Config{
MaxParkedNodes: 2,
MaxParkedNodes: "2",
UpgradeStatusLabel: "upgrade-status",
},
dryRun: false,
Expand Down
Loading