fix(e2e): wait for flow cleanup before checking service after deletion in metallb test by oilbeater · Pull Request #6359 · kubeovn/kube-ovn

oilbeater · 2026-02-27T04:14:26Z

Summary

The metallb e2e test should support metallb and underlay combine fails intermittently with curl exit code 56 (connection reset) when checking the second service's reachability immediately after deleting the first service
Root cause: the OVN LoadBalancer data plane update and the daemon's syncFlows (ovs-ofctl replace-flows) have not yet completed when the test checks the second service, causing a brief disruption to the remaining service's data path
Added waitUnderlayServiceFlowCleaned helper that polls all nodes until the deleted service's OpenFlow rules are fully removed from the external bridge, ensuring the data plane has stabilized before connectivity checks

Test plan

Run the metallb e2e test suite multiple times to verify the flaky test no longer fails
Verify the new wait logic correctly detects flow rule cleanup within the 15s timeout

🤖 Generated with Claude Code

…n in metallb test The metallb e2e test "should support metallb and underlay combine" fails intermittently with curl exit code 56 (connection reset) when checking the second service's reachability immediately after deleting the first service. This happens because the OVN LoadBalancer data plane update and the daemon's syncFlows (ovs-ofctl replace-flows) have not yet completed, causing a brief disruption to the remaining service's data path. Add a waitUnderlayServiceFlowCleaned helper that polls all nodes until the deleted service's OpenFlow rules are fully removed from the external bridge, ensuring the data plane has stabilized before checking the second service. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-02-27T04:14:37Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a persistent flakiness in a metallb end-to-end test by introducing a crucial synchronization mechanism. It ensures that the network data plane has fully stabilized after a service deletion, specifically by waiting for associated OpenFlow rules to be cleared, before validating the reachability of other services. This prevents race conditions and improves the reliability of the test suite.

Highlights

Flaky Test Fix: Resolved an intermittent curl exit code 56 failure in the metallb e2e test should support metallb and underlay combine.
Root Cause Addressed: The issue stemmed from OVN LoadBalancer data plane updates and syncFlows not completing before subsequent service reachability checks, causing temporary data path disruptions.
New Helper Function: Introduced waitUnderlayServiceFlowCleaned to poll all nodes, ensuring that deleted service's OpenFlow rules are fully removed from the external bridge before proceeding with connectivity checks.

Changelog

test/e2e/metallb/e2e_test.go
- Integrated a call to waitUnderlayServiceFlowCleaned after deleting the first service to ensure OpenFlow rules are cleaned up.
- Implemented the new helper function waitUnderlayServiceFlowCleaned which polls nodes to verify the removal of specific OpenFlow rules.

Activity

The pull request description indicates it was generated with Claude Code.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to fix a flaky e2e test by waiting for OpenFlow rules to be cleaned up after a service deletion. The approach is sound. However, the implementation of the check in the new waitUnderlayServiceFlowCleaned function could be more robust. The current use of grep -w with an IP address is not reliable. I've provided a suggestion to improve the check by using grep -F with more specific patterns, which will make the test more stable and accurate.

gemini-code-assist · 2026-02-27T04:16:29Z

test/e2e/metallb/e2e_test.go

+func waitUnderlayServiceFlowCleaned(nodeNames []string, providerNetworkName, serviceIP string, servicePort int32, timeout time.Duration) {
+	ginkgo.GinkgoHelper()
+
+	bridgeName := util.ExternalBridgeName(providerNetworkName)
+	matchPort := fmt.Sprintf("tp_dst=%d", servicePort)
+
+	framework.WaitUntil(1*time.Second, timeout, func(_ context.Context) (bool, error) {
+		for _, nodeName := range nodeNames {
+			cmd := fmt.Sprintf("kubectl ko ofctl %s dump-flows %s | grep -w %s | grep -w %s",
+				nodeName, bridgeName, serviceIP, matchPort)
+			if _, err := exec.Command("bash", "-c", cmd).CombinedOutput(); err == nil {
+				return false, nil // flow still exists on this node
+			}
+		}
+		return true, nil // flow cleaned from all nodes
+	}, fmt.Sprintf("underlay service flow for %s should be cleaned up", serviceIP))
+}


The use of grep -w to match an IP address is not reliable. The grep utility considers words to be sequences of letters, digits, and underscores. An IP address string containing dots is not treated as a single word, so grep -w will not match it as intended. This can cause the check to fail incorrectly.

Additionally, without the -F flag, grep treats the pattern as a regular expression, where . is a wildcard for any character. This could lead to incorrect matches (e.g., 1.2.3.4 could match 1a2b3c4).

To make the check more robust and correct, I suggest using grep -F for fixed-string matching and constructing a more specific pattern that includes the OpenFlow field name, such as nw_dst=<IP> or ipv6_dst=<IP>. This avoids both the word-boundary issue and the regex wildcard issue.

func waitUnderlayServiceFlowCleaned(nodeNames []string, providerNetworkName, serviceIP string, servicePort int32, timeout time.Duration) { ginkgo.GinkgoHelper() bridgeName := util.ExternalBridgeName(providerNetworkName) matchPort := fmt.Sprintf("tp_dst=%d", servicePort) var ipMatch string if strings.Contains(serviceIP, ":") { ipMatch = fmt.Sprintf("ipv6_dst=%s", serviceIP) } else { ipMatch = fmt.Sprintf("nw_dst=%s", serviceIP) } framework.WaitUntil(1*time.Second, timeout, func(_ context.Context) (bool, error) { for _, nodeName := range nodeNames { cmd := fmt.Sprintf("kubectl ko ofctl %s dump-flows %s | grep -F -- %q | grep -F -- %q", nodeName, bridgeName, ipMatch, matchPort) if _, err := exec.Command("bash", "-c", cmd).CombinedOutput(); err == nil { return false, nil // flow still exists on this node } } return true, nil // flow cleaned from all nodes }, fmt.Sprintf("underlay service flow for %s should be cleaned up", serviceIP)) }

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 27, 2026

dosubot bot added the test automation tests label Feb 27, 2026

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

oilbeater merged commit f9865af into master Feb 27, 2026
74 of 76 checks passed

oilbeater deleted the fix/e2e-metallb-wait-flow-cleanup branch February 27, 2026 06:46

oilbeater mentioned this pull request Feb 27, 2026

fix(e2e): replace hard sleeps and unretried checks with WaitUntil polling #6362

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2e): wait for flow cleanup before checking service after deletion in metallb test#6359

fix(e2e): wait for flow cleanup before checking service after deletion in metallb test#6359
oilbeater merged 1 commit intomasterfrom
fix/e2e-metallb-wait-flow-cleanup

oilbeater commented Feb 27, 2026

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oilbeater commented Feb 27, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant