gc interface if it's not exist in podLister#5789
Conversation
Summary of ChangesHello @oilbeater, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug related to orphaned OVS interfaces by overhauling the garbage collection mechanism. Instead of relying solely on OVS internal state, the system now actively cross-references OVS interfaces with the Kubernetes Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a garbage collection mechanism for OVS interfaces that belong to non-existent pods. The implementation is sound, but I've found a couple of critical issues that could lead to panics or compilation errors, along with a medium-severity issue for improving code robustness. My review includes suggestions to fix these issues.
pkg/daemon/controller.go
Outdated
| return | ||
| } | ||
| for iface, pod := range interfacePodMap { | ||
| podNamespace, podName := strings.Split(pod, "/")[0], strings.Split(pod, "/")[1] |
There was a problem hiding this comment.
This line can cause a panic if the pod string from interfacePodMap does not contain a /. This would crash the daemon. It's safer to check the result of strings.Split before accessing elements by index to prevent this.
parts := strings.Split(pod, "/")
if len(parts) != 2 {
klog.Warningf("malformed pod name '%s' for interface %s, skipping", pod, iface)
continue
}
podNamespace, podName := parts[0], parts[1]| } | ||
| ifaceName := strings.TrimSpace(parts[0]) | ||
| var podNamespace, podName string | ||
| for externalID := range strings.FieldsSeq(parts[1]) { |
There was a problem hiding this comment.
This loop is incorrect and will cause a compilation error. strings.FieldsSeq is not a standard Go function and appears to be a typo for strings.Fields. Additionally, using range with a single variable on a slice iterates over indices, so externalID would be an int. The loop should be corrected to iterate over the values of the split string.
| for externalID := range strings.FieldsSeq(parts[1]) { | |
| for _, externalID := range strings.Fields(parts[1]) { |
| if strings.Contains(externalID, "pod_name=") { | ||
| podName = strings.TrimPrefix(strings.TrimSpace(externalID), "pod_name=") | ||
| } | ||
|
|
||
| for _, intf := range interfaceList { | ||
| name, errText := strings.Trim(strings.Split(intf, "\n")[0], "\""), strings.Split(intf, "\n")[1] | ||
| if strings.Contains(errText, "No such device") { | ||
| qosList, err := ovsFind("port", "qos", "name="+name) | ||
| if err != nil { | ||
| klog.Errorf("failed to find related port %v", err) | ||
| return | ||
| if strings.Contains(externalID, "pod_namespace=") { | ||
| podNamespace = strings.TrimPrefix(strings.TrimSpace(externalID), "pod_namespace=") | ||
| } |
There was a problem hiding this comment.
Using strings.Contains to check for keys in external_ids is not robust. It can lead to false positives if a key name is a substring of another. It's safer to use strings.HasPrefix to ensure you're matching the exact key. Also, strings.TrimSpace is redundant here since strings.Fields (the corrected function) splits by whitespace and the resulting tokens won't have surrounding spaces.
| if strings.Contains(externalID, "pod_name=") { | |
| podName = strings.TrimPrefix(strings.TrimSpace(externalID), "pod_name=") | |
| } | |
| for _, intf := range interfaceList { | |
| name, errText := strings.Trim(strings.Split(intf, "\n")[0], "\""), strings.Split(intf, "\n")[1] | |
| if strings.Contains(errText, "No such device") { | |
| qosList, err := ovsFind("port", "qos", "name="+name) | |
| if err != nil { | |
| klog.Errorf("failed to find related port %v", err) | |
| return | |
| if strings.Contains(externalID, "pod_namespace=") { | |
| podNamespace = strings.TrimPrefix(strings.TrimSpace(externalID), "pod_namespace=") | |
| } | |
| if strings.HasPrefix(externalID, "pod_name=") { | |
| podName = strings.TrimPrefix(externalID, "pod_name=") | |
| } | |
| if strings.HasPrefix(externalID, "pod_namespace=") { | |
| podNamespace = strings.TrimPrefix(externalID, "pod_namespace=") | |
| } |
Pull Request Test Coverage Report for Build 18459036696Details
💛 - Coveralls |
2fc0224 to
e019052
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the interface cleanup mechanism in the OVS daemon to use a Kubernetes pod lister for garbage collection instead of cleaning all lost interfaces. It introduces a more targeted approach to clean up OVS interfaces only when the corresponding pods don't exist in the cluster.
- Replaced the generic
CleanLostInterfacefunction with a newgcInterfacesmethod that checks pod existence before cleanup - Added
ListInterfacePodMapto map OVS interfaces to their corresponding pods - Refactored
CleanLostInterfaceinto a reusableCleanInterfacefunction with proper error handling
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/ovs/ovs-vsctl.go | Added ListInterfacePodMap and refactored cleanup logic into CleanInterface |
| pkg/daemon/controller.go | Implemented gcInterfaces method using pod lister for targeted cleanup |
| pkg/ovs/ovs-vsctl_test.go | Removed obsolete test for the old cleanup function |
| pkg/ovs/ovn-nb-suite_test.go | Removed obsolete test case for the old cleanup function |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
0b923d9 to
8237ec9
Compare
There was a problem hiding this comment.
This is the final PR Bugbot will review for you during this billing cycle
Your free Bugbot reviews will reset on November 30
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| continue | ||
| } | ||
|
|
||
| podNamespace, podName, errText := parts[0], parts[1], parts[2] |
There was a problem hiding this comment.
Bug: Pod String Parsing Error Affects OVS Cleanup
The gcInterfaces function incorrectly parses the pod string when the error text contains forward slashes. strings.Split truncates the error message, which can cause the "No such device" check to fail and prevent proper cleanup of orphaned OVS interfaces.
| klog.Infof("delete lost port %s", name) | ||
| output, err := Exec("--if-exists", "--with-iface", "del-port", name) | ||
| } | ||
| result[ifaceName] = fmt.Sprintf("%s/%s/%s", podNamespace, podName, errText) |
There was a problem hiding this comment.
Bug: Pod Parsing Error Due to Substring Matching
The ListInterfacePodMap function's parsing of external_ids for pod_name and pod_namespace is too broad. Using strings.Contains can match substrings, leading to incorrect TrimPrefix results and potentially empty pod names or namespaces. This creates malformed entries in the returned map, which can cause issues for downstream functions relying on these values.
The garbage collector introduced in PR kubeovn#5789 incorrectly identifies KubeVirt VM interfaces as 'lost' and deletes them, breaking network connectivity for VMs within ~60 seconds of creation. Root cause: For KubeVirt VMs, the pod_name in OVS external_ids is set to the VM name (e.g., 'ubuntu-vm-br'), not the launcher pod name (e.g., 'virt-launcher-ubuntu-vm-br-xyz'). When gcInterfaces() tries to look up the pod by the VM name, it fails and incorrectly deletes the interface. Solution: When a pod is not found by direct lookup, check if it might be a KubeVirt VM by searching for launcher pods with the label 'vm.kubevirt.io/name' matching the pod_name from OVS. If a matching launcher pod exists, keep the interface instead of deleting it. This fix maintains backward compatibility with non-KubeVirt pods while preventing false-positive deletions for KubeVirt VMs. Signed-off-by: Damir Nugmanov <damir_nug@mail.ru>
The garbage collector introduced in PR kubeovn#5789 incorrectly identifies KubeVirt VM interfaces as 'lost' and deletes them, breaking network connectivity for VMs within ~60 seconds of creation. Root cause: For KubeVirt VMs, the pod_name in OVS external_ids is set to the VM name (e.g., 'ubuntu-vm-br'), not the launcher pod name (e.g., 'virt-launcher-ubuntu-vm-br-xyz'). When gcInterfaces() tries to look up the pod by the VM name, it fails and incorrectly deletes the interface. Solution: When a pod is not found by direct lookup, check if it might be a KubeVirt VM by searching for launcher pods with the label 'vm.kubevirt.io/name' matching the pod_name from OVS. If a matching launcher pod exists, keep the interface instead of deleting it. This fix maintains backward compatibility with non-KubeVirt pods while preventing false-positive deletions for KubeVirt VMs. Signed-off-by: Damir Nugmanov <damir_nug@mail.ru> Signed-off-by: dnugmanov <damir_nug@mail.ru>
The garbage collector introduced in PR kubeovn#5789 incorrectly identifies KubeVirt VM interfaces as 'lost' and deletes them, breaking network connectivity for VMs within ~60 seconds of creation. Root cause: For KubeVirt VMs, the pod_name in OVS external_ids is set to the VM name (e.g., 'ubuntu-vm-br'), not the launcher pod name (e.g., 'virt-launcher-ubuntu-vm-br-xyz'). When gcInterfaces() tries to look up the pod by the VM name, it fails and incorrectly deletes the interface. Solution: When a pod is not found by direct lookup, check if it might be a KubeVirt VM by searching for launcher pods with the label 'vm.kubevirt.io/name' matching the pod_name from OVS. If a matching launcher pod exists, keep the interface instead of deleting it. This fix maintains backward compatibility with non-KubeVirt pods while preventing false-positive deletions for KubeVirt VMs. Signed-off-by: dnugmanov <damir_nug@mail.ru>
The garbage collector introduced in PR #5789 incorrectly identifies KubeVirt VM interfaces as 'lost' and deletes them, breaking network connectivity for VMs within ~60 seconds of creation. Root cause: For KubeVirt VMs, the pod_name in OVS external_ids is set to the VM name (e.g., 'ubuntu-vm-br'), not the launcher pod name (e.g., 'virt-launcher-ubuntu-vm-br-xyz'). When gcInterfaces() tries to look up the pod by the VM name, it fails and incorrectly deletes the interface. Solution: When a pod is not found by direct lookup, check if it might be a KubeVirt VM by searching for launcher pods with the label 'vm.kubevirt.io/name' matching the pod_name from OVS. If a matching launcher pod exists, keep the interface instead of deleting it. This fix maintains backward compatibility with non-KubeVirt pods while preventing false-positive deletions for KubeVirt VMs. Signed-off-by: dnugmanov <damir_nug@mail.ru>
Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
|
@mengyu0829 感谢发现的问题,前两个问题确实存在。第三个问题由于 daemon 在 list 的时候只 watch 和自己节点相关的 pod,如果 pod 已经到了另一个节点理论上本地的 cache 中应该已经不存在这个 pod 了。针对前两个问题不知道能否提一个 pr 来修复一下 |
好的,前两个问题我们修改后验证下,第三个问题之前环境出现过,我再看看能不能复现。 |
|
第三个问题我能想到的一个兜底方式就是 Get 到这个 Pod 后再检查一下 nodeName 是不是和自己一致,如果再出现问题应该就是 client-go 本身机制上的问题了 |
This reverts commit 8886b91. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
|
另外补充下, 直接kubectl delete sts的pod复现不了, 可以eidt sts的副本数 (比如从3改成0,再从0改成3,)或者镜像tag就比较容易出现。 |
|
我去复现一下,感觉比较奇怪,sts 缩容如果触发 cni del 事件也应该删除端口才对,不应该等到 gc 才去删除,能不能给个完整的 statefulset yaml 会怀疑和升级策略或者其他参数也有关系 |
|
|
我这里还是无法复现,能否看下在 Statefulset 的 pod 删除和创建过程中对应节点的 kube-ovn-cni 日志,看看是什么原因导致端口没有随着 pod 删除而删除 |





Pull Request
What type of this PR
Examples of user facing changes:
Which issue(s) this PR fixes
Fixes #5724