Description
Description
Observed Behavior:
Karpenter is not removing expired nodes, even though we have expireAfter
(5 days) configured.
The issue appears to be related to PodsDisruptionBudgets (PDB) blocking node consolidation attempts.
I reviewed the blocking PDBs and they are configured correctly (selector is ok, configured to maxUnavailable 25%)
Example of one of the PDBs blocking:
If im not mistaken 4 Distruptions allowed means 4 pods can be stopped,
To confirm we are not in an edge case where we have 4 pods affected by this PDB on the same node i checked and saw only 1 pod on the relevant node
I fetched karpenter events and see what looks like Multi Node Consolidation attempts, (color matched node with nodeclaim)
I dont see in the events any standalone single node consolidation attempts, which should be able to remove the expired nodes (reviewed multiple nodes, non of them would be blocked by PDB if they were distrupted by themselves)
I attempted to find if single node consolidation is disabled but havent found anything in logs/documentation, enabled debug on karpenter but that didn't provide any relevant information
Edit 1:
Tried on a cluster with Karpenter 1.0.8, same issue, Adding graphs from datadog:
Right graph- Memory Requests spikes were manually caused by myself
Left graph- sum of memory in all nodes in the nodepull (node-mixed is the relevant nodepool), node-mixed spiked as expected but after over 2 hours, we have more than double the amount of capacity we had before the spike (and describing the nodeclaim shows PDB related as stated above)
Expected Behavior:
Nodes with expired expireAfter
configuration should be removed, even when PDBs are in place, as long as the disruptions are within the allowed limits.
According to documentation Single Node consolidation is supposed to run on all nodes, which doesnt seem to run.
Observed on Karpenter 0.36.2, upgraded to 0.36.8 but didnt see a change in behavior.
Reproduction Steps (Please include YAML):
Have multiple deployments with large replica count and PDB, increase the replica count (to trigger karpenter creating additional nodes), wait a few minutes then reduce replica count (karpenter will begin scaling down and remove some of the nodes, but not all)
I've generated generic yamls for this and adding them
KarpenterStress.txt
Versions:
- Chart Version: 0.36.2 + 0.36.8
- Kubernetes Version (
kubectl version
): 1.29
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Activity