Skip to content

Comments

feat: add Smart VXLAN Controller for automated bridge management#426

Open
Sinodaiiii wants to merge 8 commits intok8snetworkplumbingwg:mainfrom
Sinodaiiii:bridge-management
Open

feat: add Smart VXLAN Controller for automated bridge management#426
Sinodaiiii wants to merge 8 commits intok8snetworkplumbingwg:mainfrom
Sinodaiiii:bridge-management

Conversation

@Sinodaiiii
Copy link

@Sinodaiiii Sinodaiiii commented Jan 31, 2026

What this PR does / why we need it:

This PR introduces a "Smart VXLAN Controller" to automate the lifecycle management of OVS bridges and node-to-node connectivity.

The core logic enhancement follows a "Create-on-Demand" and "Connect-Automatically" flow:

  1. First, it enables the CNI plugin to automatically provision OVS bridges when they are requested but do not exist.
  2. Then, a background controller detects these bridges across the cluster and establishes VXLAN tunnels to mesh same-named bridges between nodes.

Key changes include:

1. On-Demand Bridge Creation (Primary Trigger):
The CNI plugin (CmdAdd) has been updated to check for the existence of the target OVS bridge. If the bridge does not exist, the plugin now automatically creates it first before attaching the container interface. This removes the prerequisite for manual bridge provisioning on nodes.

2. Automated Connectivity (Controller Logic):
A new controller (integrated into the ovs-cni-marker daemon) watches Kubernetes Node objects. Once a bridge is successfully created and reported in the node status:

  • The controller detects the presence of the same OVS bridge on peer nodes.
  • It automatically establishes VXLAN tunnels between the nodes, creating a full mesh for that specific bridge.

3. Intelligent Lifecycle Management (Auto-Delete):
To keep the node clean, the CNI plugin (CmdDel) now automatically deletes the bridge if it becomes empty (i.e., when the last container port is removed).

  • Note: br-int is explicitly exempted from auto-deletion to preserve default cluster connectivity.

4. Underlying Driver Enhancements:
Enhanced pkg/ovsdb with necessary primitives for these operations, including CreateVxlanPort, NewBridgeDriverFromExisting, and IsBridgeEmpty.

Why we need it:

Currently, administrators often need to manually provision OVS bridges on every node or use external configuration tools to set up tunnels. This PR makes ovs-cni self-sufficient in managing the entire datapath lifecycle, significantly reducing operational complexity for dynamic network setups where bridges need to be created and connected on the fly.

Special notes for your reviewer:

I have split the changes into 3 atomic commits to facilitate the review process:

  1. feat(ovsdb): Adds underlying driver support for VXLAN operations and bridge management (CreateVxlanPort, NewBridgeDriverFromExisting, IsBridgeEmpty).
  2. feat(controller): Implements the main controller logic, updates the CNI plugin to trigger bridge auto-creation/deletion, and updates vendor/ dependencies.
  3. feat(marker): Integrates the controller into the main entry point (cmd/marker) to run in the background.

Please verify the vendor/ updates in the second commit, as new Kubernetes client dependencies were required for the controller.

Release note:

New Feature: Introduced a Smart VXLAN Controller that automatically manages OVS bridge lifecycles (auto-create/auto-delete) and establishes VXLAN tunnels between nodes for seamless connectivity.

Signed-off-by: Sinodaiiii <ds_daisheng@163.com>
Signed-off-by: Sinodaiiii <ds_daisheng@163.com>
Signed-off-by: Sinodaiiii <ds_daisheng@163.com>
@kubevirt-bot
Copy link
Collaborator

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 31, 2026
@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Sinodaiiii
Once this PR has been reviewed and has the lgtm label, please assign schseba for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 31, 2026
@kubevirt-bot
Copy link
Collaborator

Hi @Sinodaiiii. Thanks for your PR.

I'm waiting for a k8snetworkplumbingwg member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gemini-code-assist
Copy link

Summary of Changes

Hello @Sinodaiiii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ovs-cni functionality by introducing a 'Smart VXLAN Controller'. This controller automates the entire lifecycle of OVS bridges and their inter-node connectivity using VXLAN tunnels, thereby reducing manual operational overhead for dynamic network configurations within a Kubernetes cluster.

Highlights

  • Automated OVS Bridge Management: The CNI plugin now automatically provisions OVS bridges when requested and deletes them when they become empty (with the exception of 'br-int').
  • Smart VXLAN Controller: A new background controller detects OVS bridges across Kubernetes nodes and automatically establishes VXLAN tunnels to mesh same-named bridges, simplifying node-to-node connectivity.
  • Enhanced OVSDB Driver: The underlying OVSDB driver (pkg/ovsdb) has been extended with new primitives for creating VXLAN ports, managing bridge lifecycles, and checking bridge emptiness.
  • Kubernetes Client Dependency Updates: New Kubernetes client dependencies were added and updated to support the controller's interaction with Kubernetes Node objects.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Smart VXLAN Controller to automate OVS bridge and VXLAN tunnel management. The core logic is sound, but there are a few critical areas for improvement. The controller needs to handle node additions and deletions to maintain correct tunnel state across the cluster. Additionally, there are opportunities to improve performance by using the informer cache instead of making direct API calls, and to make error handling more robust. Overall, this is a great feature that will significantly simplify network management.

Signed-off-by: Sinodaiiii <ds_daisheng@163.com>
Signed-off-by: Sinodaiiii <ds_daisheng@163.com>
Signed-off-by: Sinodaiiii <ds_daisheng@163.com>
…etion

Signed-off-by: Sinodaiiii <ds_daisheng@163.com>
This commit transitions the network topology to a Full-Mesh architecture within each bridge:

- ovsdb: Disables RSTP/STP on bridges to allow direct connectivity between all nodes.

- controller: Implements Split Horizon by setting 'no-flood' on VXLAN ports to prevent broadcast loops.

- controller: Adds critical rollback mechanism to delete unsafe ports if flood protection fails.

Signed-off-by: Sinodaiiii <ds_daisheng@163.com>
@phoracek
Copy link
Member

phoracek commented Feb 2, 2026

Hello @Sinodaiiii, thanks for the very interesting PR. However, I would politely reject it and ask you to implement the controller as a standalone project. I'd be happy to then mention in in the README.

We would like to keep this project focused to the core of CNI - to connect workload to the OVS bridge.

If I understand your code correctly, it should be possible to implement everything there with separate controllers and CNIs which you would chain before or after the main OVS CNI. Is there anything you think would require tight integration with ovs-cni?

Petr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants