Skip to content

enable check vlan conflict #5110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 43 commits into from

Conversation

zbb88888
Copy link
Collaborator

Pull Request

What type of this PR

Examples of user facing changes:

  • Features
  1. kube-ovn-controller 保证新创建 vlan crd 不和集群内的 tunnel nic vlan 以及已存在的 vlan 冲突(vlan crd 生效前处理,vlan status conflict 为 true,子网 not ready)
  2. kube-ovn-cni 保证新创建 vlan crd 不和集群内的 tunnel vlan,以及 node 上已有的 vlan 子接口冲突 (记录 ProviderNetwork condition err,not ready)
  3. kube-ovn-cni 周期性检查:vlan 已创建,如果发现有其他provider network 同网卡的子接口 vlan 冲突,则(记录 ProviderNetwork condition err,not ready)

Which issue(s) this PR fixes

Fixes #(issue-number)

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @zbb88888, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a mechanism to check for VLAN conflicts within the Kube-OVN environment. It ensures that newly created VLANs do not conflict with existing tunnel VLANs on nodes or other VLANs already present in the cluster. This is achieved by adding a conflict status to the VLAN CRD and implementing checks in both the kube-ovn-controller and kube-ovn-cni components. The changes include adding a new configuration option to enable/disable the VLAN conflict check, updating the VLAN status to reflect conflicts, and implementing periodic checks to identify and report conflicts.

Highlights

  • VLAN Conflict Detection: Implements checks to prevent VLAN ID conflicts between newly created VLAN CRDs, existing tunnel VLANs on nodes, and other VLANs in the cluster.
  • Kube-OVN Controller Changes: Modifies the kube-ovn-controller to ensure new VLAN CRDs do not conflict with tunnel NIC VLANs or existing VLANs before becoming effective. Sets the VLAN status to 'conflict' and marks the subnet as 'not ready' if a conflict is detected.
  • Kube-OVN CNI Changes: Updates kube-ovn-cni to check for conflicts between new VLAN CRDs and tunnel VLANs, as well as existing VLAN sub-interfaces on nodes. Records a ProviderNetwork condition error and marks the network as 'not ready' if conflicts are found. Also, implements periodic checks to detect conflicts with other Provider Networks.
  • Configuration Option: Adds a new configuration flag --enable-check-vlan-conflict to enable or disable the VLAN conflict check feature.
  • Node Labeling: The daemon now patches nodes with labels indicating the tunnel VLAN ID, which the controller uses to check for conflicts.

Changelog

Click here to see the changelog
  • dist/images/install.sh
    • Adds --enable-check-vlan-conflict=true to kube-ovn-controller's arguments.
    • Sets a default value for IFACE to bond0.123.
  • pkg/apis/kubeovn/v1/vlan.go
    • Adds a Conflict field to the VlanStatus struct to indicate if the VLAN has conflicts.
  • pkg/controller/config.go
    • Adds a EnableCheckVlanConflict field to the Configuration struct.
    • Adds a command-line flag --enable-check-vlan-conflict to control whether to enable VLAN conflict checking.
  • pkg/controller/subnet.go
    • Checks for VLAN conflicts before formatting a subnet if EnableCheckVlanConflict is true. Returns an error if the VLAN is in conflict.
  • pkg/controller/vlan.go
    • Imports fmt, strconv, and github.com/scylladb/go-set/strset.
    • In handleAddVlan, checks for VLAN conflicts with node tunnel VLANs and other VLANs. Updates the VLAN status to Conflict if a conflict is found.
  • pkg/daemon/config.go
    • Adds netlink import.
    • Adds IfaceVlanID and EnableCheckVlanConflict fields to the Configuration struct.
    • Adds a command-line flag --enable-check-vlan-conflict to control whether to enable VLAN conflict checking.
    • Caches the tunnel NIC VLAN ID during initialization if EnableCheckVlanConflict is enabled.
    • Adds a getVLAN function to retrieve the VLAN ID of a given interface.
  • pkg/daemon/controller.go
    • In initProviderNetwork, checks if the VLAN ID is already used by the tunnel interface on the node. Records a ProviderNetwork error if a conflict is found.
    • In Run, patches the node with tunnel VLAN ID labels and starts a periodic check for VLAN conflicts.
    • Adds loopCheckVlanConflict to periodically check for VLAN conflicts between tunnel VLANs and OVS VLANs, and between OVS VLANs and OS VLANs. Records ProviderNetwork errors if conflicts are found.
  • pkg/daemon/ovs_linux.go
    • Imports slices and github.com/scylladb/go-set/strset.
    • Adds loopCheckVlanConflict to check for vlan conflicts.
    • Adds patchNodeTunnelVlanLabel to patch the node with tunnel vlan labels.
  • pkg/util/const.go
    • Adds TunnelVlanIDLabel and TunnelUseVlanLabel constants for node labels.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A VLAN's tale, a tangled web,
Of IDs clashing, causing ebb.
To check and guard, with vigilant eye,
Conflicts resolved, beneath the sky.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@zbb88888 zbb88888 changed the title Loop check bond conflicts enable check vlan conflict Mar 28, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a VLAN conflict check feature to prevent the creation of conflicting VLANs in the cluster. The feature is implemented in both the kube-ovn-controller and kube-ovn-cni components. Overall, the code is well-structured and addresses the intended functionality. However, there are some areas where improvements can be made to enhance maintainability, efficiency, and clarity.

Summary of Findings

  • VLAN Conflict Check Logic: The VLAN conflict check logic is implemented in both the controller and daemon, leading to potential redundancy and inconsistency. Consolidating this logic into a shared library or webhook would improve maintainability and ensure consistent behavior across components.
  • Error Handling in Conflict Detection: The error handling in the VLAN conflict detection loops could be improved. Currently, errors are logged but the loop continues, potentially masking other conflicts. Consider returning immediately upon encountering an error to prevent further processing and ensure that all errors are surfaced.
  • Logging Consistency: The logging messages use inconsistent formatting and verbosity levels. Standardizing the logging format and using consistent verbosity levels would improve readability and debuggability.

Merge Readiness

The pull request introduces a valuable feature for preventing VLAN conflicts. However, the identified issues regarding redundant logic, error handling, and logging consistency should be addressed before merging. Addressing these concerns will improve the overall quality and maintainability of the code. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging.

Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
@coveralls
Copy link

coveralls commented Apr 9, 2025

Pull Request Test Coverage Report for Build 14967332477

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 257 (0.0%) changed or added relevant lines in 6 files are covered.
  • 218 unchanged lines in 9 files lost coverage.
  • Overall coverage decreased (-0.1%) to 21.616%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controller/config.go 0 2 0.0%
pkg/daemon/controller.go 0 14 0.0%
pkg/daemon/config.go 0 27 0.0%
pkg/controller/subnet.go 0 32 0.0%
pkg/controller/vlan.go 0 65 0.0%
pkg/daemon/ovs_linux.go 0 117 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/daemon/ovs_linux.go 1 0.0%
pkg/ovs/ovn-nb-bfd.go 2 61.61%
pkg/util/pod_routes.go 2 95.0%
pkg/daemon/config.go 3 0.0%
pkg/ovs/ovn-nb-logical_router_route.go 5 75.13%
pkg/apis/kubeovn/v1/vip.go 7 0.0%
pkg/daemon/controller.go 8 0.0%
pkg/controller/vip.go 57 0.0%
pkg/controller/vpc.go 133 0.0%
Totals Coverage Status
Change from base Build 14744616393: -0.1%
Covered Lines: 10254
Relevant Lines: 47436

💛 - Coveralls

zbb88888 and others added 10 commits April 9, 2025 17:25
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
@zbb88888 zbb88888 force-pushed the loop-check-bond-conflicts branch from 35d5639 to c05f0de Compare April 30, 2025 08:00
@zbb88888 zbb88888 force-pushed the loop-check-bond-conflicts branch from fb0dc2c to eb1ae92 Compare April 30, 2025 09:23
@zbb88888 zbb88888 marked this pull request as ready for review April 30, 2025 09:24
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. feature New network feature labels Apr 30, 2025
@zbb88888
Copy link
Collaborator Author

测试用例:

  1. 测试创建的 vlan 和 tunnel 网卡冲突

期望: 保证 tunnel 网卡优先,vlan 创建失败 status conflict 为 true, vlan 对应子网 为 not ready

image

image

image

image

image

自测通过

  1. 测试创建的 vlan 和 已有vlan 冲突

存在多个vlan有同一个 vlan id,后创建的vlan和子网会失败

创建一个正常的 vlan, 子网

image

image

image

image

image

image

  1. 测试创建的 vlan 和 os 本地 bond 子接口冲突
    期望: 保证 vlan 优先,os 本地 bond 子接口自动删除

先创建一个 vlan,然后在宿主机上创建一个冲突的 bond 子接口,和 kube-ovn-cni 配置的 ovs vlan 冲突,触发 自动删除 冲突的 bond 子接口

image

kube-ovn-cni log 显示目前 ovs 和 os vlan 接口没有冲突

image

image

创建冲突 vlan 子接口,制造冲突, 观察log,看到 os上存在的冲突子接口自动被删除

image

image

image

Signed-off-by: zbb88888 <[email protected]>
@zbb88888 zbb88888 requested a review from oilbeater April 30, 2025 09:32
@zbb88888
Copy link
Collaborator Author

@oilbeater 大佬,测试用例在上面已补充,图中有解释,如果担心自动删除子接口有风险,代码中的开关我可以修改为默认关闭

Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
zbb88888 added 5 commits May 7, 2025 15:26
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
Signed-off-by: zbb88888 <[email protected]>
@@ -56,6 +60,14 @@ func (c *Controller) handleAddVlan(key string) error {
}
}

if c.config.EnableCheckVlanConflict {
err := c.checkVlanConflict(vlan, vlan.Spec.ProviderInterfaceName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProviderInterfaceName 不推荐使用,尽可能使用 provider

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已调整为使用 pn.Spec.DefaultInterface

@oilbeater
Copy link
Collaborator

我建议这次只做 CR 里 Vlan ID 冲突的检测,这个确定性是比较高的。和主机的检测现在通过网卡名判断 Vlan ID 会有很大的不确定性,而且和主机上其他子接口的冲突检测不确定性也比较高,很难确定是谁配错了。

@zbb88888
Copy link
Collaborator Author

我建议这次只做 CR 里 Vlan ID 冲突的检测,这个确定性是比较高的。和主机的检测现在通过网卡名判断 Vlan ID 会有很大的不确定性,而且和主机上其他子接口的冲突检测不确定性也比较高,很难确定是谁配错了。

是的,可能会出现这样的原因,但我们线上经常出现一种错误,就是部署前有人测试 bond.xxx 在交换机上的连通性,但是忘记删除的情况。

而且这个功能默认是没有启用的,如果用户提前都是规划好网络的话,可能要上的集群比较多的话,可能会经常遇到我们的这种琐碎的排查事项。

@zbb88888
Copy link
Collaborator Author

经沟通:仅提交:

  1. vlan 自己的 CR 的VLAN 冲突判断

其余需要放到一个外部的 debug 模块中,用于在集群部署之前做检查

@zbb88888 zbb88888 closed this May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New network feature size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants