test(e2e): optimize test execution time by reducing unnecessary waits#6369
test(e2e): optimize test execution time by reducing unnecessary waits#6369
Conversation
- Reduce global poll interval from 2s to 1s for faster resource readiness detection - Fix WaitUntil ignoring its interval parameter (was hardcoded to 2s) - Reduce disaster test sleep from 60s to 5s (process STOP/CONT simulation) - Reduce nc timeout from 5s to 2s in NAT policy external access checks - Add PeriodSeconds:1 and FailureThreshold:1 to VPC pod probe tests - Skip redundant StatefulSet IPPool iterations (replicas 1,2) and test only replicas=3 - Parallelize NAT policy ipset/iptables checks across nodes Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on significantly improving the execution time of the end-to-end test suite. By fine-tuning various wait times, polling intervals, and introducing parallel processing for network policy checks, the changes aim to make the test suite run much faster without compromising test coverage or reliability. The overall impact is a more efficient and quicker feedback loop for developers. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several optimizations to reduce E2E test execution time. The changes, such as reducing poll intervals, parallelizing checks, and skipping redundant test iterations, are aligned with the goal of improving test suite performance. The implementation of these optimizations appears correct, and I have no suggestions for improvement.
Pull Request Test Coverage Report for Build 22548664986Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
… period Replace all 150 hardcoded 2*time.Second poll intervals with the framework poll constant (1s) or time.Second for files outside the framework package. Also reduce TerminationGracePeriodSeconds from 3 to 1 for pods and add it to StatefulSets to speed up teardown. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
… intervals - Add TerminationGracePeriodSeconds=1 to MakeDeployment, consistent with MakePod and MakeStatefulSet. Without this, deployment pods use the default 30s grace period, slowing down all deployment-based tests. - Reduce checkIPSetOnNode poll interval from 3s to 1s to match the unified poll standard. - Reduce waitForInterfaceState poll interval from 5s to 1s to match the unified poll standard. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Restructure AfterEach blocks in 8 test files to delete resources in parallel within the same dependency level, then wait for all to disappear before proceeding to the next level. This reduces cleanup time from N × single-resource-time to max(single-resource-time) per level. Add PodClient.DeleteGracefully() method that initiates pod deletion with a 1-second grace period without blocking, enabling parallel pod cleanup alongside other resource types. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Use retry.RetryOnConflict to wrap node label update operations to handle 409 Conflict errors caused by stale resourceVersion when concurrent controllers modify the node between Get and Update. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
… pods GetPodsForDeployment() does not filter out pods with DeletionTimestamp, so when pods are deleted and recreated within the same ReplicaSet, both old (terminating) and new (running) pods are returned, causing "expect 3 but got 6" failures. Fix by using DeleteGracefully (1s grace period) and explicitly waiting for old pods to be fully removed before checking new pod count. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
…Version Wrap Deployment and DaemonSet Restart() with retry.RetryOnConflict to handle 409 Conflict errors caused by concurrent controller status updates between Get and Update. Each retry fetches the latest object to ensure a current resourceVersion. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
WaitUntilignoring itsintervalparameter (was hardcoded to 2s)nctimeout from 5s to 2s in NAT policy external access checksPeriodSeconds:1andFailureThreshold:1to VPC pod probe tests for faster failure detectionEstimated time savings: ~270-380 seconds total across the E2E suite.
Test plan
🤖 Generated with Claude Code