-
Notifications
You must be signed in to change notification settings - Fork 408
Feature commit #2697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature commit #2697
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gangavh1008 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @gangavh1008! |
|
Hi @gangavh1008. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| - WhenEmpty | ||
| - WhenEmptyOrUnderutilized | ||
| type: string | ||
| useOnConsolidationAfter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is bit a confusing, would you consider changing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will change it. thank you for the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the name to consolidationGracePeriod
| When replicas is set, UseOnConsolidationAfter is simply ignored | ||
| pattern: ^(([0-9]+(s|m|h))+|Never)$ | ||
| type: string | ||
| useOnConsolidationUtilizationThreshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a prior discussion with the maintainers about my proposed way to fix this there was suggestion to focus on cost vs utilization since that's what Karpenter optimizes for and adding a utilization gate would be at odds with the core consolidation logic.
For example in AWS an m5.xlarge instance will be cheaper vs m8.xlarge and I'm sure there's other cases where an instance type with more resources could end up being cheaper vs smaller one. There's also the scenario of reserved instances or otherwise special rates on specific instance types such that it could be preferable to run at lower utilization levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the insightful feedback. You raise a valid point that deserves careful consideration.
The Core Concern
You're correct that Karpenter's consolidation logic optimizes for cost, not utilization. The utilization threshold in this PR could conflict with cost optimization in scenarios like:
- Instance type pricing inversions: m5.xlarge might be cheaper than m8.xlarge
- Reserved/Savings Plans: Pre-purchased capacity should be used even at low utilization
- Spot pricing variations: A larger spot instance might be cheaper than a smaller on-demand one
Proposed Solutions
Cost-Aware Protection
Instead of a utilization threshold, protect nodes that are already cost-optimal - meaning Karpenter's consolidation algorithm determined no cheaper alternative exists:
spec:
disruption:
useOnConsolidationAfter: 1h
# Remove utilization threshold entirely
# Protection applies when node is stable AND cost-optimal
I will change the feature implementation to this approach, and submit the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the utilization threshold entirely. The feature is now simplified to:
spec:
disruption:
consolidateAfter: 30s
consolidationGracePeriod: 1h # Simple grace period, no utilization check
The protection logic is now:
- Node becomes consolidatable (stable for consolidateAfter)
- Consolidation evaluates the node using its normal cost-based algorithm
- If consolidation finds a cheaper option → CONSOLIDATE ✅
- If no cheaper option → Grace period applied (prevents re-evaluation for consolidationGracePeriod)
The feature doesn't try to be smarter than Karpenter's consolidation algorithm. It simply provides a cooldown to prevent churn from repeated re-evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ellistarn , please review at your convenience, thank you.
|
Hey @gangavh1008. I'd suggest coming to alignment with the maintainers in an issue before moving on to implementation. |
Hi @ellistarn, I agree. I went ahead with implementation, prioritizing the internal requirements. Requesting maintainers to share the feedback, happy to incorporate the changes. |
|
For context, we're trying to think more broadly about this consolidation space, and a bunch of key stakeholders are about to head oit for the holidays. We want to do better here and agree this is a problem -- I am not sure this is the right approach in the specifics. |
@ellistarn , sure. Thank you for going through the approach. As you suggested, let me put google doc for the consensus building on design with maintainers in the issue. |
feat: implement consolidationGracePeriod to prevent consolidation churn - issue 7146
Fixes #N/A
Description
This PR introduces the consolidationGracePeriod feature to address excessive node churn caused by consolidation cycles. The feature makes nodes "invisible" to the consolidation process for a configurable duration after any pod event (add/remove), preventing both source and destination churn.
Problem
With the existing consolidateAfter mechanism, a problematic consolidation cycle emerges:
The core issue is that receiving pods from consolidation makes a node unconsolidatable, creating a feedback loop where the destination of one consolidation becomes protected while source nodes become targets.
Solution
New fields in NodePool
Disruptionspec:How it works:
When consolidationGracePeriod is configured:
Changes
API:
Added ConsolidationGracePeriod field to Disruption struct in pkg/apis/v1/nodepool.go
Disruption Logic:
Updated Controllers:
CRDs:
Documentation:
How was this change tested?
Unit Tests:
EKS Integration Testing:
Deployed custom Karpenter image to EKS cluster
Tested with consolidationGracePeriod: 90s
Verified 3 scenarios:
✅ New node protected during grace period, visible after expiration
✅ Timer resets on each pod event (add/remove)
✅ Multiple operations with grace period protecting nodes during activity
Migration Path
No migration required: Feature is opt-in via new optional field
Existing NodePools: Continue to work exactly as before
New NodePools: Can opt-in by setting consolidationGracePeriod
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.