Skip to content

test(e2e): optimize test execution time by reducing unnecessary waits#6369

Merged
oilbeater merged 7 commits intomasterfrom
optimize-e2e-timing
Mar 1, 2026
Merged

test(e2e): optimize test execution time by reducing unnecessary waits#6369
oilbeater merged 7 commits intomasterfrom
optimize-e2e-timing

Conversation

@oilbeater
Copy link
Copy Markdown
Collaborator

Summary

  • Reduce global poll interval from 2s to 1s for faster resource readiness detection
  • Fix WaitUntil ignoring its interval parameter (was hardcoded to 2s)
  • Reduce disaster test sleep from 60s to 5s (process STOP/CONT simulation only needs brief pause)
  • Reduce nc timeout from 5s to 2s in NAT policy external access checks
  • Add PeriodSeconds:1 and FailureThreshold:1 to VPC pod probe tests for faster failure detection
  • Skip redundant StatefulSet IPPool iterations (replicas 1,2) and test only replicas=3
  • Parallelize NAT policy ipset/iptables checks across nodes using goroutines

Estimated time savings: ~270-380 seconds total across the E2E suite.

Test plan

  • Run full Conformance E2E (dual, overlay) and verify all tests pass
  • Compare total execution time against baseline (~1687s)
  • Verify disaster tests still correctly detect process outage and recovery
  • Verify NAT policy parallel checks produce same results as sequential

🤖 Generated with Claude Code

- Reduce global poll interval from 2s to 1s for faster resource readiness detection
- Fix WaitUntil ignoring its interval parameter (was hardcoded to 2s)
- Reduce disaster test sleep from 60s to 5s (process STOP/CONT simulation)
- Reduce nc timeout from 5s to 2s in NAT policy external access checks
- Add PeriodSeconds:1 and FailureThreshold:1 to VPC pod probe tests
- Skip redundant StatefulSet IPPool iterations (replicas 1,2) and test only replicas=3
- Parallelize NAT policy ipset/iptables checks across nodes

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. test automation tests labels Mar 1, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on significantly improving the execution time of the end-to-end test suite. By fine-tuning various wait times, polling intervals, and introducing parallel processing for network policy checks, the changes aim to make the test suite run much faster without compromising test coverage or reliability. The overall impact is a more efficient and quicker feedback loop for developers.

Highlights

  • Test Execution Optimization: Reduced global poll interval from 2 seconds to 1 second, and fixed the WaitUntil function to correctly use its interval parameter, preventing hardcoded 2-second waits.
  • Disaster Test Speedup: Decreased the sleep duration in disaster recovery tests from 60 seconds to 5 seconds, accelerating process outage simulations.
  • NAT Policy Check Parallelization: Implemented parallel execution for NAT policy IPset and iptables rule checks across multiple nodes using goroutines, significantly speeding up these verifications.
  • VPC Pod Probe Configuration: Added PeriodSeconds: 1 and FailureThreshold: 1 to VPC pod probes, enabling faster detection of probe failures.
  • Reduced Network Command Timeout: Lowered the nc timeout from 5 seconds to 2 seconds in NAT policy external access checks.
  • StatefulSet IPPool Test Scope: Optimized StatefulSet IPPool tests by skipping redundant iterations and only testing with 3 replicas.
Changelog
  • test/e2e/connectivity/e2e_test.go
    • Reduced sleep duration in disaster recovery tests for ovn sb, ovn-controller, and ovs-vswitchd processes from 60s to 5s.
  • test/e2e/framework/framework.go
    • Decreased the global poll constant from 2 seconds to 1 second.
  • test/e2e/framework/wait.go
    • Modified WaitUntil function signature to accept and utilize the interval parameter, fixing a hardcoded 2-second interval.
  • test/e2e/kube-ovn/ipam/ipam.go
    • Adjusted the loop for StatefulSet IPPool tests to only run for 3 replicas, skipping 1 and 2.
  • test/e2e/kube-ovn/pod/vpc_pod_probe.go
    • Added PeriodSeconds: 1 and FailureThreshold: 1 to all liveness/readiness probes in VPC pod tests.
  • test/e2e/kube-ovn/subnet/subnet.go
    • Imported the sync package to enable concurrency.
    • Parallelized the checkNatPolicyIPsets function to verify IPsets on multiple nodes concurrently.
    • Parallelized the checkNatPolicyRules function to verify iptables rules on multiple nodes concurrently.
    • Reduced the nc timeout from 5 seconds to 2 seconds in checkAccessExternal for both IPv4 and IPv6.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several optimizations to reduce E2E test execution time. The changes, such as reducing poll intervals, parallelizing checks, and skipping redundant test iterations, are aligned with the goal of improving test suite performance. The implementation of these optimizations appears correct, and I have no suggestions for improvement.

@coveralls
Copy link
Copy Markdown

coveralls commented Mar 1, 2026

Pull Request Test Coverage Report for Build 22548664986

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.003%) to 23.086%

Files with Coverage Reduction New Missed Lines %
pkg/ovs/ovn-nb-logical_router_route.go 2 74.6%
Totals Coverage Status
Change from base Build 22538259073: -0.003%
Covered Lines: 12559
Relevant Lines: 54402

💛 - Coveralls

oilbeater and others added 6 commits March 1, 2026 13:59
… period

Replace all 150 hardcoded 2*time.Second poll intervals with the framework
poll constant (1s) or time.Second for files outside the framework package.
Also reduce TerminationGracePeriodSeconds from 3 to 1 for pods and add it
to StatefulSets to speed up teardown.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
… intervals

- Add TerminationGracePeriodSeconds=1 to MakeDeployment, consistent with
  MakePod and MakeStatefulSet. Without this, deployment pods use the
  default 30s grace period, slowing down all deployment-based tests.
- Reduce checkIPSetOnNode poll interval from 3s to 1s to match the
  unified poll standard.
- Reduce waitForInterfaceState poll interval from 5s to 1s to match the
  unified poll standard.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Restructure AfterEach blocks in 8 test files to delete resources in
parallel within the same dependency level, then wait for all to
disappear before proceeding to the next level. This reduces cleanup
time from N × single-resource-time to max(single-resource-time) per
level.

Add PodClient.DeleteGracefully() method that initiates pod deletion
with a 1-second grace period without blocking, enabling parallel
pod cleanup alongside other resource types.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Use retry.RetryOnConflict to wrap node label update operations to handle
409 Conflict errors caused by stale resourceVersion when concurrent
controllers modify the node between Get and Update.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
… pods

GetPodsForDeployment() does not filter out pods with DeletionTimestamp,
so when pods are deleted and recreated within the same ReplicaSet, both
old (terminating) and new (running) pods are returned, causing
"expect 3 but got 6" failures.

Fix by using DeleteGracefully (1s grace period) and explicitly waiting
for old pods to be fully removed before checking new pod count.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
…Version

Wrap Deployment and DaemonSet Restart() with retry.RetryOnConflict to
handle 409 Conflict errors caused by concurrent controller status updates
between Get and Update. Each retry fetches the latest object to ensure
a current resourceVersion.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@oilbeater oilbeater merged commit 3168803 into master Mar 1, 2026
77 checks passed
@oilbeater oilbeater deleted the optimize-e2e-timing branch March 1, 2026 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files. test automation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants