Skip to content

Karpenter fails to scale down to desired state, Indefinitely blocked by PDB #7586

Open
@omri-kaslasi

Description

@omri-kaslasi

Description

Observed Behavior:
Karpenter is not removing expired nodes, even though we have expireAfter (5 days) configured.
Screenshot 2025-01-12 at 18 29 40

The issue appears to be related to PodsDisruptionBudgets (PDB) blocking node consolidation attempts.
Screenshot 2025-01-12 at 18 29 00

I reviewed the blocking PDBs and they are configured correctly (selector is ok, configured to maxUnavailable 25%)
Example of one of the PDBs blocking:
generation 1

If im not mistaken 4 Distruptions allowed means 4 pods can be stopped,
To confirm we are not in an edge case where we have 4 pods affected by this PDB on the same node i checked and saw only 1 pod on the relevant node
Screenshot 2025-01-13 at 11 13 01

I fetched karpenter events and see what looks like Multi Node Consolidation attempts, (color matched node with nodeclaim)
Pasted Graphic 6

I dont see in the events any standalone single node consolidation attempts, which should be able to remove the expired nodes (reviewed multiple nodes, non of them would be blocked by PDB if they were distrupted by themselves)

I attempted to find if single node consolidation is disabled but havent found anything in logs/documentation, enabled debug on karpenter but that didn't provide any relevant information

Edit 1:
Tried on a cluster with Karpenter 1.0.8, same issue, Adding graphs from datadog:
Right graph- Memory Requests spikes were manually caused by myself
Left graph- sum of memory in all nodes in the nodepull (node-mixed is the relevant nodepool), node-mixed spiked as expected but after over 2 hours, we have more than double the amount of capacity we had before the spike (and describing the nodeclaim shows PDB related as stated above)
image

Expected Behavior:
Nodes with expired expireAfter configuration should be removed, even when PDBs are in place, as long as the disruptions are within the allowed limits.
According to documentation Single Node consolidation is supposed to run on all nodes, which doesnt seem to run.
Observed on Karpenter 0.36.2, upgraded to 0.36.8 but didnt see a change in behavior.

Reproduction Steps (Please include YAML):
Have multiple deployments with large replica count and PDB, increase the replica count (to trigger karpenter creating additional nodes), wait a few minutes then reduce replica count (karpenter will begin scaling down and remove some of the nodes, but not all)
I've generated generic yamls for this and adding them
KarpenterStress.txt

Versions:

  • Chart Version: 0.36.2 + 0.36.8
  • Kubernetes Version (kubectl version): 1.29
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingneeds-triageIssues that need to be triaged

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions