Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't provision unnecessary capacity for pods which can't move to a new node #2033

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

saurav-agarwalla
Copy link
Contributor

@saurav-agarwalla saurav-agarwalla commented Feb 26, 2025

Fixes #1842, #1928, aws/karpenter-provider-aws#7521

Description
This PR takes care of two cases where Karpenter provisions unnecessary capacity today:

  1. Pods with karpenter.sh/do-not-disrupt=true: they aren't evictable so it makes sense for Karpenter to not consider them reschedulable as well since the only end state for these pods is either to get into a terminal state or be forcibly deleted (after a termination grace period). This prevents Karpenter from spinning up and reserving unnecessary capacity for these pods on new nodes.
  2. Pods which can't be evicted due to PDB violation: Karpenter doesn't know how long it will take for these pods to be successfully evicted and in absence of TGP, it can take a really long time for these pods to be evicted (if at all). When this happens, Karpenter shouldn't provision unnecessary capacity for these pods.

Without this change:

  • Karpenter will continue to bring up new nodeclaims when the original nodeclaim expires (but can't be terminated because these pods hold up the termination in the absence of a termination grace period)
  • Karpenter will not be able to consolidate nodeclaims nominated for these pods because these pods are never going to move to that nodeclaim

How was this change tested?
Reproduced the scenario where a nodeclaim has pods with karpenter.sh/do-not-disrupt=true and saw that Karpenter was continuously spinning up new nodeclaims after the expiry of the original nodeclaim (even though it was stuck due to these pods). It wasn't able to consolidate new nodeclaims that were nominated for these pods either.

After the change, Karpenter doesn't spin up new nodeclaims for these pods.

Did the same thing for pods whose eviction was blocked due to PDB.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: saurav-agarwalla
Once this PR has been reviewed and has the lgtm label, please assign maciekpytel for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 26, 2025
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 26, 2025
@otoupin-nsesi
Copy link

otoupin-nsesi commented Feb 27, 2025

Could we have the same “not reschedulable” behaviour for “problematic” PDBs? They cause the same issues and have the same behaviour as the do-not-disrupt pods, a.k.a. they are not truly reschedulable and will never move (without a TGP).

For example, the following PDB:

NAMESPACE       NAME                 MIN-AVAILABLE     MAX-UNAVAILABLE     ALLOWED-DISRUPTIONS     CURRENT     DESIRED     EXPECTED

cnpg-system     my-db-primary                    1                 n/a                       0           1           1            1

is “problematic” in a Karpenter context, causing the same behaviour you are trying to patch, so it would make sense if they are treated the same. The downside being it’s a lot more logic than just checking for do-not-disrupt annotations, and maybe it doesn’t belong here.

Similar problematic PDBs: singleton pod (replica 1, min-available 1) (likely done on purpose), and misconfigured PDBs (blocks in similar ways, but are a mistake. Could be ignored as you could argue it should fixed or detected by policies).

@saurav-agarwalla
Copy link
Contributor Author

That's precisely what we discussed today: #1928 (comment)

I am exploring that option. But I want to keep these changes separate to make it easier to review and justify.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 28, 2025
@saurav-agarwalla saurav-agarwalla changed the title fix: don't mark pods with 'karpenter.sh/do-not-disrupt=true' as reschedulable fix: don't provision unnecessary capacity for pods which can't move to a new node Feb 28, 2025
@saurav-agarwalla
Copy link
Contributor Author

Pushed changes to handle this for pods which don't get evicted due to PDB violation as well since they weren't huge.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 13596255744

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 39 of 46 (84.78%) changed or added relevant lines in 4 files are covered.
  • 7 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.009%) to 81.652%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/controllers.go 0 1 0.0%
pkg/controllers/provisioning/provisioner.go 3 5 60.0%
pkg/controllers/node/termination/terminator/eviction.go 34 38 89.47%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/provisioning/scheduling/preferences.go 7 86.52%
Totals Coverage Status
Change from base Build 13548001391: 0.009%
Covered Lines: 9497
Relevant Lines: 11631

💛 - Coveralls

@saurav-agarwalla
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 28, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants