Skip to content

Clean stale VF representor ports from OVS bridges during switchdev reconciliation#1048

Open
almaslennikov wants to merge 1 commit intok8snetworkplumbingwg:masterfrom
almaslennikov:clean-ovs-bridges-upstream
Open

Clean stale VF representor ports from OVS bridges during switchdev reconciliation#1048
almaslennikov wants to merge 1 commit intok8snetworkplumbingwg:masterfrom
almaslennikov:clean-ovs-bridges-upstream

Conversation

@almaslennikov
Copy link
Contributor

After host reboot, VF representor ports remain stuck in OVS bridges if pods were not deleted before reboot. When the daemon reconciles and re-creates VFs, the new representors get the same names, causing errors during new pod creation trying to use the same rail resource.

Extend RemoveInterfaceFromOVSBridge to accept pfName and numVfs to clean representor interfaces ({pfName}_{vfIndex}) alongside the PF uplink. Propagate pfName through the setEswitchModeAndNumVFs call chain so detachPFFromBridge always has the info needed for representor cleanup.

Add a new detachPFFromBridge call in createVFs() after the skip check, so cleanup only runs when VFs actually need re-creation. Fix multi-uplink bridge lookup to iterate all uplinks instead of only checking Uplinks[0].

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions github-actions bot added the tests label Mar 5, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where stale VF representor ports remain in OVS bridges after a host reboot, leading to potential errors during new pod creation. The changes enhance the OVS bridge cleanup mechanism to proactively remove these stale representors and the PF uplink during switchdev reconciliation. This ensures a clean state for VF re-creation and improves the robustness of network interface management by correctly identifying and detaching all relevant interfaces.

Highlights

  • Enhanced OVS Bridge Cleanup: The RemoveInterfaceFromOVSBridge function now explicitly removes stale VF representor ports (constructed as {pfName}_{vfIndex}) in addition to the PF uplink from managed OVS bridges.
  • Improved Reconciliation Logic: A new conditional call to detachPFFromBridge was added within createVFs for ESwithModeSwitchDev to ensure proper cleanup of representors and PF uplinks before VF re-creation, particularly after host reboots.
  • Parameter Propagation: The pfName and numVfs parameters are now consistently passed through the setEswitchModeAndNumVFs call chain, including detachPFFromBridge, to facilitate comprehensive interface detachment.
  • Multi-Uplink Bridge Lookup Fix: The logic for finding related managed OVS bridges was updated to iterate through all uplinks of a known configuration to find a match, instead of only checking the first uplink.
Changelog
  • pkg/helper/mock/mock_helper.go
    • Updated the DetachInterfaceFromManagedBridge mock signature to include pfName and numVfs parameters.
  • pkg/host/internal/bridge/bridge.go
    • Modified the DetachInterfaceFromManagedBridge function signature to accept pfName and numVfs.
    • Updated the internal call to b.ovs.RemoveInterfaceFromOVSBridge to pass the new parameters.
    • Updated the function's documentation to reflect the new behavior of detaching VF representors.
  • pkg/host/internal/bridge/bridge_test.go
    • Updated test cases for DetachInterfaceFromManagedBridge to include the new pfName and numVfs arguments in mock expectations and function calls.
  • pkg/host/internal/bridge/ovs/mock/mock_ovs.go
    • Updated the RemoveInterfaceFromOVSBridge mock signature to accept pciAddress, pfName, and numVfs parameters.
  • pkg/host/internal/bridge/ovs/ovs.go
    • Modified the RemoveInterfaceFromOVSBridge function signature to include pfName and numVfs.
    • Updated the function's documentation to specify the removal of PF uplink and stale VF representor interfaces.
    • Adjusted the bridge lookup logic to iterate through all uplinks of a known configuration to find a match, instead of just the first uplink.
    • Added a loop to remove stale VF representor interfaces (named {pfName}_{vfIndex}) before removing the PF uplink.
    • Updated logging to include pfName and numVfs.
  • pkg/host/internal/bridge/ovs/ovs_test.go
    • Updated test cases for RemoveInterfaceFromOVSBridge to include the new pfName and numVfs arguments in function calls.
  • pkg/host/internal/sriov/sriov.go
    • Modified setEswitchModeAndNumVFs function signature and its setEswitchModeAndNumVFsFn type to include pfName.
    • Propagated pfName through calls to setEswitchModeAndNumVFs in ResetSriovDevice and configureHWOptionsForSwitchdev.
    • Added a conditional call to s.detachPFFromBridge within createVFs for ESwithModeSwitchDev to clean up stale representors and PF uplinks.
    • Modified setEswitchModeAndNumVFsMlx and setEswitchModeAndNumVFsIce function signatures to include pfName.
    • Updated the call to s.detachPFFromBridge within setEswitchModeAndNumVFsMlx to pass pfName and numVfs.
    • Modified detachPFFromBridge function signature to include pfName and numVfs, and updated its internal call to s.bridgeHelper.DetachInterfaceFromManagedBridge.
    • Updated logging in detachPFFromBridge to include pfName and numVfs.
  • pkg/host/internal/sriov/sriov_test.go
    • Removed an empty Context block.
  • pkg/host/mock/mock_host.go
    • Updated the DetachInterfaceFromManagedBridge mock signature to include pfName and numVfs parameters.
  • pkg/host/types/interfaces.go
    • Modified the DetachInterfaceFromManagedBridge interface definition to include pfName and numVfs parameters.
    • Updated the function's documentation to reflect the new behavior of detaching VF representors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the issue of stale VF representor ports in OVS bridges after a host reboot. However, while introducing this cleanup mechanism, the implementation introduces a security regression by trusting interface names provided in the external configuration spec for deletion operations, which could allow an attacker with control over SRIOV policies to delete arbitrary OVS ports. The suggested fix uses trusted information from the local store to perform the cleanup safely. Additionally, there is one critical comment regarding a potential syntax error in a test file.

@almaslennikov almaslennikov force-pushed the clean-ovs-bridges-upstream branch from 10b44cb to 52cf61e Compare March 5, 2026 16:16
@coveralls
Copy link

Pull Request Test Coverage Report for Build 22727085437

Details

  • 34 of 48 (70.83%) changed or added relevant lines in 6 files are covered.
  • 18 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.1%) to 63.231%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/host/internal/bridge/ovs/ovs.go 15 18 83.33%
pkg/host/internal/sriov/sriov.go 12 15 80.0%
pkg/helper/mock/mock_helper.go 0 4 0.0%
pkg/host/mock/mock_host.go 0 4 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/utils/cluster.go 2 85.47%
controllers/sriovnetworknodepolicy_controller.go 16 68.31%
Totals Coverage Status
Change from base Build 22632427177: -0.1%
Covered Lines: 9367
Relevant Lines: 14814

💛 - Coveralls

funcLog.Error(err, "RemoveInterfaceFromOVSBridge(): failed to read data from store")
return fmt.Errorf("failed to read data from store: %v", err)
}
var relatedBridges []*sriovnetworkv1.OVSConfigExt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change needed ? i dont think its related to what you are trying to fix is it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code only checked kc.Uplinks[0], so for multi-uplink bridges (like multiplane setups with more than one PF on each bridge), if the PF being detached was at index 1, the lookup would miss it entirely and skip the cleanup. The refactor iterates all uplinks to find the matching PCI address. Also simplified from a relatedBridges slice to a single brConf pointer with early break — a PCI address can only belong to one bridge, so collecting multiple matches and warning was unnecessary.

if err := o.deleteInterfaceByName(ctx, dbClient, brConf.Uplinks[0].Name); err != nil {
// Remove stale VF representor interfaces
for i := 0; i < numVfs; i++ {
repName := fmt.Sprintf("%s_%d", pfName, i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: thats how we name representors via udev in the operator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if its always safe to use the passed in numVFs e.g node state spec changed during reboot
we could list the ports and just delete ports that have this type of name format. not sure if its a good idea just a thought

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a code comment referencing the udev naming convention.

Re iterating over numVFs, it is safe because deleteInterfaceByName is a no-op when the interface doesn't exist in OVSDB — so if numVfs is larger than what's actually in the bridge, the extra iterations are harmless. On the other hand, if there were previously more VFs than the currently configured numVfs, the leftover representors won't conflict with the current configuration since they have different names. If the user changes numVfs back, they will get cleaned up. Added a comment in the code explaining this.

…conciliation

After host reboot, VF representor ports remain stuck in OVS bridges if
pods were not deleted before reboot. When the daemon reconciles and
re-creates VFs, the new representors get the same names, causing errors
during new pod creation trying to use the same rail resource.

Extend RemoveInterfaceFromOVSBridge to accept pfName and numVfs to clean
representor interfaces ({pfName}_{vfIndex}) alongside the PF uplink.
Propagate pfName through the setEswitchModeAndNumVFs call chain so
detachPFFromBridge always has the info needed for representor cleanup.

Add a new detachPFFromBridge call in createVFs() after the skip check,
so cleanup only runs when VFs actually need re-creation. Fix multi-uplink
bridge lookup to iterate all uplinks instead of only checking Uplinks[0].

Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
@almaslennikov almaslennikov force-pushed the clean-ovs-bridges-upstream branch from 52cf61e to 720d25a Compare March 11, 2026 10:26
@almaslennikov
Copy link
Contributor Author

CI failures don't seem related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants