fix(e2e): poll for ovs-ovn pod readiness instead of single snapshot check#6364
fix(e2e): poll for ovs-ovn pod readiness instead of single snapshot check#6364
Conversation
…heck RolloutStatus may return immediately with stale DaemonSet status when the controller hasn't yet updated NumberReady to reflect the deleted pod, causing the subsequent GetPods to find the replacement pod still in Pending state. Replace the single-snapshot GetPods + ExpectEqual assertion with WaitUntil polling that waits for the pod on the target node to be Running and Ready. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a persistent flakiness issue in the ovs-ovn pod recreation e2e test. The previous implementation was prone to failure due to race conditions where the DaemonSet status was not immediately updated after a pod deletion, causing the test to check for pod readiness too early. The core change involves transitioning from a single, immediate check for pod status to a robust polling mechanism, ensuring the test accurately reflects the actual state of the ovs-ovn pod. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a flaky e2e test by replacing a single snapshot check for pod readiness with a more robust polling mechanism using framework.WaitUntil. This change correctly handles the eventual consistency of pod statuses in Kubernetes and should resolve the flakiness. The implementation looks good. I have one suggestion to improve code readability within the polling logic.
| for i := range pods.Items { | ||
| if pods.Items[i].Spec.NodeName == suiteCtx.Node && | ||
| pods.Items[i].Status.Phase == corev1.PodRunning && | ||
| len(pods.Items[i].Status.ContainerStatuses) != 0 && | ||
| pods.Items[i].Status.ContainerStatuses[0].Ready { | ||
| pod = &pods.Items[i] | ||
| return true, nil | ||
| } | ||
| } |
There was a problem hiding this comment.
The if condition with multiple && operators is a bit long and can be hard to read. To improve readability and maintainability, you could refactor this to first check for the pod on the correct node, and then, in a nested block, check for its readiness. This separation of concerns makes the logic clearer.
for i := range pods.Items {
p := &pods.Items[i]
if p.Spec.NodeName != suiteCtx.Node {
continue
}
if p.Status.Phase == corev1.PodRunning &&
len(p.Status.ContainerStatuses) != 0 &&
p.Status.ContainerStatuses[0].Ready {
pod = p
return true, nil
}
}
Summary
RolloutStatusreturning with stale DaemonSet statusGetPods+ExpectEqualassertion withWaitUntilpolling that waits for the replacement pod to be Running and ReadyRoot Cause
After deleting an ovs-ovn pod,
RolloutStatuschecksDaemonSet.Status.NumberReadyviaDaemonSetStatusViewer. When the DaemonSet controller hasn't yet updated the status counters to reflect the deleted pod,RolloutStatusreads stale "all ready" status and returns immediately. The subsequentGetPodsthen finds the replacement pod still inPendingstate, causing the assertion to fail.Test plan
🤖 Generated with Claude Code