Karpenter fails to scale down to desired state, Indefinitely blocked by PDB

### Description

**Observed Behavior**:
Karpenter is not removing expired nodes, even though we have `expireAfter` (5 days) configured. 
![Screenshot 2025-01-12 at 18 29 40](https://github.com/user-attachments/assets/52f4649e-c096-4737-933b-b4db9cb9359a)

The issue appears to be related to PodsDisruptionBudgets (PDB) blocking node consolidation attempts.
![Screenshot 2025-01-12 at 18 29 00](https://github.com/user-attachments/assets/4db51876-0319-45ac-ab1a-20b60fe6fe5e)

I reviewed the blocking PDBs and they are configured correctly (selector is ok, configured to maxUnavailable 25%)
Example of one of the PDBs blocking:
![generation 1](https://github.com/user-attachments/assets/dd95384f-be72-4b13-9c7a-ef1e111f08a7)

If im not mistaken 4 Distruptions allowed means 4 pods can be stopped, 
To confirm we are not in an edge case where we have 4 pods affected by this PDB on the same node i checked and saw only 1 pod on the relevant node
![Screenshot 2025-01-13 at 11 13 01](https://github.com/user-attachments/assets/6e4904a1-5c95-4805-8818-03396428543e)

I fetched karpenter events and see what looks like [Multi Node Consolidation](https://karpenter.sh/v0.32/concepts/disruption/) attempts, (color matched node with nodeclaim)
![Pasted Graphic 6](https://github.com/user-attachments/assets/789a2b61-0b40-4d0b-9e9d-8bf27c278682)

I dont see in the events any standalone single node consolidation attempts, which should be able to remove the expired nodes (reviewed multiple nodes, non of them would be blocked by PDB if they were distrupted by themselves)

I attempted to find if single node consolidation is disabled but havent found anything in logs/documentation, enabled debug on karpenter but that didn't provide any relevant information

Edit 1:
Tried on a cluster with Karpenter 1.0.8, same issue, Adding graphs from datadog:
Right graph- Memory Requests spikes were manually caused by myself 
Left graph- sum of memory in all nodes in the nodepull (node-mixed is the relevant nodepool), node-mixed spiked as expected but after over 2 hours, we have more than double the amount of capacity we had before the spike (and describing the nodeclaim shows PDB related as stated above)
![image](https://github.com/user-attachments/assets/9cfde4e0-01bd-44f0-be36-96c3037d33f9)


**Expected Behavior**:
Nodes with expired `expireAfter` configuration should be removed, even when PDBs are in place, as long as the disruptions are within the allowed limits.
According to [documentation](https://karpenter.sh/v0.32/concepts/disruption/) Single Node consolidation is supposed to run on all nodes, which doesnt seem to run.
Observed on Karpenter 0.36.2, upgraded to 0.36.8 but didnt see a change in behavior.


**Reproduction Steps** (Please include YAML):
Have multiple deployments with large replica count and PDB, increase the replica count (to trigger karpenter creating additional nodes), wait a few minutes then reduce replica count (karpenter will begin scaling down and remove some of the nodes, but not all)
I've generated generic yamls for this and adding them
[KarpenterStress.txt](https://github.com/user-attachments/files/18394889/KarpenterStress.txt)

**Versions**:
- Chart Version: 0.36.2 + 0.36.8
- Kubernetes Version (`kubectl version`): 1.29

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Karpenter fails to scale down to desired state, Indefinitely blocked by PDB #7586

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Karpenter fails to scale down to desired state, Indefinitely blocked by PDB #7586

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions