Skip to content

Comments

daemon: skip device plugin wait when no policies are configured#1038

Merged
zeeke merged 1 commit intok8snetworkplumbingwg:masterfrom
SchSeba:fix_waiting_for_dp_when_no_policy
Feb 19, 2026
Merged

daemon: skip device plugin wait when no policies are configured#1038
zeeke merged 1 commit intok8snetworkplumbingwg:masterfrom
SchSeba:fix_waiting_for_dp_when_no_policy

Conversation

@SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Feb 18, 2026

When the blockDevicePluginUntilConfigured feature gate is enabled and there are no SriovNetworkNodePolicy resources targeting a node, the config-daemon's apply() function calls waitForDevicePluginPodAndTryUnblock which polls for up to 2 minutes waiting for a device plugin pod that will never arrive. The device plugin daemonset is only scheduled on nodes with policies (SriovDevicePluginLabel=Enabled), so this wait always times out when Spec.Interfaces is empty.

Skip the device plugin wait and the periodic unblock API call when the desired node state has no interfaces configured. This matches the existing guard in tryUnblockDevicePlugin() which already checks for empty interfaces before removing the wait-for-config annotation.

When the blockDevicePluginUntilConfigured feature gate is enabled and
there are no SriovNetworkNodePolicy resources targeting a node, the
config-daemon's apply() function calls waitForDevicePluginPodAndTryUnblock
which polls for up to 2 minutes waiting for a device plugin pod that
will never arrive. The device plugin daemonset is only scheduled on
nodes with policies (SriovDevicePluginLabel=Enabled), so this wait
always times out when Spec.Interfaces is empty.

Skip the device plugin wait and the periodic unblock API call when the
desired node state has no interfaces configured. This matches the
existing guard in tryUnblockDevicePlugin() which already checks for
empty interfaces before removing the wait-for-config annotation.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions github-actions bot added the tests label Feb 18, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @SchSeba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an inefficiency in the daemon's reconciliation loop where it would unnecessarily wait for a device plugin pod to become ready, even when no SR-IOV network policies were configured for the node. By introducing a check for configured interfaces, the daemon now avoids this prolonged and futile waiting period, improving overall performance and responsiveness in scenarios without SR-IOV policies.

Highlights

  • Optimized Device Plugin Wait: The daemon now intelligently skips waiting for the device plugin pod when no SR-IOV network policies (interfaces) are configured for a node, preventing unnecessary timeouts.
  • Conditional Unblock Logic: The periodic unblock mechanism for the device plugin is now also conditional, only attempting to unblock if the feature gate is enabled and interfaces are actually configured.
  • New Test Case: A new test has been added to validate that the device plugin wait is correctly skipped when no interfaces are present in the desired node state.
Changelog
  • pkg/daemon/daemon.go
    • Modified the Reconcile function to add a condition (len(desiredNodeState.Spec.Interfaces) > 0) to the device plugin unblock logic, ensuring it only runs when interfaces are configured.
    • Updated the apply function to explicitly check if desiredNodeState.Spec.Interfaces is empty before calling waitForDevicePluginPodAndTryUnblock, skipping the wait if no interfaces are present.
  • pkg/daemon/daemon_test.go
    • Added a new test case named "Should not wait for device plugin pod when there are no interfaces and blockDevicePluginUntilConfigured is enabled" to verify the new skipping behavior.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue where the config-daemon would wait unnecessarily for a device plugin pod even when no SR-IOV policies were configured, leading to a timeout. The changes in pkg/daemon/daemon.go to skip the wait based on the number of interfaces in the desired state are logical and well-implemented. The accompanying test case in pkg/daemon/daemon_test.go is thorough and correctly validates the fix. I have one minor suggestion to improve code readability.

Comment on lines +414 to 421
if len(desiredNodeState.Spec.Interfaces) == 0 {
reqLogger.Info("no interfaces in desired state, skipping device plugin wait as device plugin won't be deployed")
} else {
if err := dn.waitForDevicePluginPodAndTryUnblock(ctx, desiredNodeState); err != nil {
reqLogger.Error(err, "failed to wait for device plugin pod to start and try to unblock it")
return ctrl.Result{}, err
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability, you could invert the condition to check for the positive case first. This avoids having an else block that only contains an if statement, which can make the code slightly easier to follow.

Suggested change
if len(desiredNodeState.Spec.Interfaces) == 0 {
reqLogger.Info("no interfaces in desired state, skipping device plugin wait as device plugin won't be deployed")
} else {
if err := dn.waitForDevicePluginPodAndTryUnblock(ctx, desiredNodeState); err != nil {
reqLogger.Error(err, "failed to wait for device plugin pod to start and try to unblock it")
return ctrl.Result{}, err
}
}
if len(desiredNodeState.Spec.Interfaces) > 0 {
if err := dn.waitForDevicePluginPodAndTryUnblock(ctx, desiredNodeState); err != nil {
reqLogger.Error(err, "failed to wait for device plugin pod to start and try to unblock it")
return ctrl.Result{}, err
}
} else {
reqLogger.Info("no interfaces in desired state, skipping device plugin wait as device plugin won't be deployed")
}

@coveralls
Copy link

Pull Request Test Coverage Report for Build 22144144095

Details

  • 6 of 9 (66.67%) changed or added relevant lines in 1 file are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.1%) to 62.981%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/daemon/daemon.go 6 9 66.67%
Files with Coverage Reduction New Missed Lines %
pkg/daemon/daemon.go 3 54.03%
Totals Coverage Status
Change from base Build 22102567607: 0.1%
Covered Lines: 9320
Relevant Lines: 14798

💛 - Coveralls

Copy link
Collaborator

@ykulazhenkov ykulazhenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I handled this case in tryUnblockDevicePlugin, but forgot to handle this in the main loop.
LGTM

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeeke zeeke merged commit 78611fd into k8snetworkplumbingwg:master Feb 19, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants