Skip to content

Conversation

@moko-poi
Copy link
Contributor

Fixes #2009

Description

This PR fixes an issue where DaemonSet pods (such as aws-node and kube-proxy) enter a Failed/Pending cycle during node disruption (consolidation/termination).

Root Cause:
When Karpenter marks a node for disruption, it applies the karpenter.sh/disrupted:NoSchedule taint to prevent new pods from scheduling. However, DaemonSet pods were being evicted during this process. When the DaemonSet controller attempts to recreate these pods, the NoSchedule taint prevents them from being scheduled back to the node, causing them to remain in Pending state. This creates a continuous cycle of pod failures and rescheduling attempts until the node is fully terminated.

Solution:
Modified the IsEvictable() and IsDrainable() functions in pkg/utils/pod/scheduling.go to explicitly exclude DaemonSet pods from eviction during node disruption. This approach aligns with kubectl drain behavior, where DaemonSet pods remain running until the node is actually terminated, allowing the DaemonSet controller to naturally manage pod recreation on other available nodes.

Changes:

  • Added !IsOwnedByDaemonSet(pod) check to IsEvictable() function
  • Added !IsOwnedByDaemonSet(pod) check to IsDrainable() function
  • Added test case to verify DaemonSet pods are not evicted during disruption
  • Updated existing test case for DaemonSet pods with PDBs to reflect the new behavior

How was this change tested?

  1. Unit Tests: Added comprehensive test cases in both disruption and termination controllers:

    • should not evict daemonset pods during node disruption - Verifies DaemonSet pods remain running when disruption taint is applied
    • should consider candidates with only daemonset pods - Verifies nodes with only DaemonSet pods can be disrupted
    • Updated should consider candidates that have fully blocking PDBs on daemonset pods - Verifies PDBs don't block disruption when only DaemonSet pods are present
  2. Test Execution: All existing tests pass with these changes:

    make test FOCUS="daemonset"
    
    • 3 DaemonSet-related tests: PASSED
    • All pkg tests: PASSED with no failures
  3. Race Detection: Tests executed with -race flag to ensure no data races introduced

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 20, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @moko-poi. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: moko-poi
Once this PR has been reviewed and has the lgtm label, please assign maciekpytel for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 20, 2025
@moko-poi moko-poi force-pushed the fix/issue-2009-daemonset-disruption branch from 61a2634 to 9038648 Compare December 20, 2025 12:50
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Dec 20, 2025
@moko-poi moko-poi force-pushed the fix/issue-2009-daemonset-disruption branch from 9038648 to 39d56e1 Compare December 21, 2025 03:24
@coveralls
Copy link

Pull Request Test Coverage Report for Build 20403988644

Details

  • 5 of 5 (100.0%) changed or added relevant lines in 1 file are covered.
  • 9 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.06%) to 80.275%

Files with Coverage Reduction New Missed Lines %
pkg/controllers/node/termination/terminator/terminator.go 2 90.2%
pkg/controllers/provisioning/scheduling/preferences.go 7 88.76%
Totals Coverage Status
Change from base Build 20384548641: -0.06%
Covered Lines: 11961
Relevant Lines: 14900

💛 - Coveralls

@jmdeal
Copy link
Member

jmdeal commented Jan 5, 2026

When the DaemonSet controller attempts to recreate these pods, the NoSchedule taint prevents them from being scheduled back to the node, causing them to remain in Pending state.

If I understand correctly this is the core issue, right? The issue isn't that daemonsets are being evicted, it's that some daemonset pods are being recreated and entering a pending / failed state. What's not clear to me is why they're being recreated - the daemonset controller should only create a pod for a node if it tolerates the taints. If it does tolerate the taint, Karpenter shouldn't have disrupted the pod in the first place due to the existing IsEvictable check.

I think there are probably use-cases where we want to drain daemonsets. Some daemonsets perform resource cleanup which may not be possible once the node is terminating. An example that comes to mind is the EBS CSI driver which cleans up VolumeAttachment objects during termination. For this reason I don't think we'd want to exclude daemonsets from the drain process altogether, but we should identify the root cause for these pods being recreated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DaemonSet pods circle in FAILED state due to rescheduling to node which is terminated

4 participants