daemon: skip device plugin wait when no policies are configured#1038
Conversation
When the blockDevicePluginUntilConfigured feature gate is enabled and there are no SriovNetworkNodePolicy resources targeting a node, the config-daemon's apply() function calls waitForDevicePluginPodAndTryUnblock which polls for up to 2 minutes waiting for a device plugin pod that will never arrive. The device plugin daemonset is only scheduled on nodes with policies (SriovDevicePluginLabel=Enabled), so this wait always times out when Spec.Interfaces is empty. Skip the device plugin wait and the periodic unblock API call when the desired node state has no interfaces configured. This matches the existing guard in tryUnblockDevicePlugin() which already checks for empty interfaces before removing the wait-for-config annotation. Signed-off-by: Sebastian Sch <sebassch@gmail.com>
|
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
Summary of ChangesHello @SchSeba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an inefficiency in the daemon's reconciliation loop where it would unnecessarily wait for a device plugin pod to become ready, even when no SR-IOV network policies were configured for the node. By introducing a check for configured interfaces, the daemon now avoids this prolonged and futile waiting period, improving overall performance and responsiveness in scenarios without SR-IOV policies. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly addresses an issue where the config-daemon would wait unnecessarily for a device plugin pod even when no SR-IOV policies were configured, leading to a timeout. The changes in pkg/daemon/daemon.go to skip the wait based on the number of interfaces in the desired state are logical and well-implemented. The accompanying test case in pkg/daemon/daemon_test.go is thorough and correctly validates the fix. I have one minor suggestion to improve code readability.
| if len(desiredNodeState.Spec.Interfaces) == 0 { | ||
| reqLogger.Info("no interfaces in desired state, skipping device plugin wait as device plugin won't be deployed") | ||
| } else { | ||
| if err := dn.waitForDevicePluginPodAndTryUnblock(ctx, desiredNodeState); err != nil { | ||
| reqLogger.Error(err, "failed to wait for device plugin pod to start and try to unblock it") | ||
| return ctrl.Result{}, err | ||
| } | ||
| } |
There was a problem hiding this comment.
For better readability, you could invert the condition to check for the positive case first. This avoids having an else block that only contains an if statement, which can make the code slightly easier to follow.
| if len(desiredNodeState.Spec.Interfaces) == 0 { | |
| reqLogger.Info("no interfaces in desired state, skipping device plugin wait as device plugin won't be deployed") | |
| } else { | |
| if err := dn.waitForDevicePluginPodAndTryUnblock(ctx, desiredNodeState); err != nil { | |
| reqLogger.Error(err, "failed to wait for device plugin pod to start and try to unblock it") | |
| return ctrl.Result{}, err | |
| } | |
| } | |
| if len(desiredNodeState.Spec.Interfaces) > 0 { | |
| if err := dn.waitForDevicePluginPodAndTryUnblock(ctx, desiredNodeState); err != nil { | |
| reqLogger.Error(err, "failed to wait for device plugin pod to start and try to unblock it") | |
| return ctrl.Result{}, err | |
| } | |
| } else { | |
| reqLogger.Info("no interfaces in desired state, skipping device plugin wait as device plugin won't be deployed") | |
| } |
Pull Request Test Coverage Report for Build 22144144095Details
💛 - Coveralls |
When the blockDevicePluginUntilConfigured feature gate is enabled and there are no SriovNetworkNodePolicy resources targeting a node, the config-daemon's apply() function calls waitForDevicePluginPodAndTryUnblock which polls for up to 2 minutes waiting for a device plugin pod that will never arrive. The device plugin daemonset is only scheduled on nodes with policies (SriovDevicePluginLabel=Enabled), so this wait always times out when Spec.Interfaces is empty.
Skip the device plugin wait and the periodic unblock API call when the desired node state has no interfaces configured. This matches the existing guard in tryUnblockDevicePlugin() which already checks for empty interfaces before removing the wait-for-config annotation.