Fix wait_for_running_pods flakyness by hmeir · Pull Request #834 · RedHatQE/openshift-virtualization-tests

hmeir · 2025-04-27T13:12:16Z

Short description:

pod respinned and its name changes, so the pod's list contains an already deleted pod name, which may lead to a failure.
The reason is that pod.instance being called for deleted pod, and the sampler getting stuck.

The solution - wrap it with the try-except block, and get the pod list with each func call

Which issue(s) this PR fixes:

Gating tests flaky across multiple versions.

Special notes for reviewer:

jira-ticket:

Summary by CodeRabbit

Refactor
- Improved error handling and logic flow in pod status checks for more robust monitoring.
- Enhanced the process for detecting non-running pods by dynamically retrieving the pod list during status checks.
- Updated logging to provide clearer information about pod statuses and warnings when issues are detected.

hmeir · 2025-04-27T13:12:26Z

/wip

ghost · 2025-04-27T13:12:33Z

Report bugs in Issues

The following are automatically added:

Add reviewers from OWNER file (in the root of the repository) under reviewers section.
Set PR size label.
New issue is created for the PR. (Closed when PR is merged/closed)
Run pre-commit if .pre-commit-config.yaml exists in the repo.

Available user actions:

To mark PR as WIP comment /wip to the PR, To remove it from the PR comment /wip cancel to the PR.
To block merging of PR comment /hold, To un-block merging of PR comment /hold cancel.
To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
verified label removed on each new commit push.
To cherry pick a merged PR comment /cherry-pick <target branch to cherry-pick to> in the PR.
- Multiple target branches can be cherry-picked, separated by spaces. (/cherry-pick branch1 branch2)
- Cherry-pick will be started when PR is merged
To build and push container image command /build-and-push-container in the PR (tag will be the PR number).
- You can add extra args to the Podman build command
  - Example: /build-and-push-container --build-arg OPENSHIFT_PYTHON_WRAPPER_COMMIT=<commit_hash>
To add a label by comment use /<label name>, to remove, use /<label name> cancel
To assign reviewers based on OWNERS file use /assign-reviewers
To check if PR can be merged use /check-can-merge
to assign reviewer to PR use /assign-reviewer @<reviewer>

PR will be approved when the following conditions are met:

/approve from one of the approvers.
Minimum number of required /lgtm (2) is met.

Approvers and Reviewers

Approvers:
- dbasunag
- dbasunag
- dshchedr
- dshchedr
- myakove
- vsibirsk
- vsibirsk
Reviewers:
- RoniKishner
- RoniKishner
- dbasunag
- dbasunag
- dshchedr
- dshchedr
- geetikakay
- vsibirsk
- vsibirsk

Supported /retest check runs

/retest tox: Retest tox
/retest build-container: Retest build-container
/retest all: Retest all

Supported labels

hold
verified
wip
lgtm
approve

hmeir · 2025-04-27T13:15:37Z

/build-and-push-container

dbasunag1 · 2025-04-27T13:16:13Z

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-834 published

hmeir · 2025-04-27T18:54:19Z

/wip cancel

hmeir · 2025-04-27T18:54:27Z

/verified

utilities/infra.py

utilities/hco.py

utilities/infra.py

hmeir · 2025-05-08T11:20:44Z

/build-and-push-container

ghost · 2025-05-08T11:21:17Z

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-834 published

hmeir · 2025-05-09T04:37:04Z

/verified

rnetser · 2025-05-10T07:14:31Z

utilities/infra.py

@@ -275,30 +275,30 @@ def wait_for_pods_deletion(pods):

 def get_pod_container_error_status(pod):
    pod_instance_status = pod.instance.status


should be wrapped with try-execpt. if the pod does not exit, this will fail.

If the pod doesn't exist we will hit NotFoundError in get_not_running_pods

get_pod_container_error_status should handle this regardless of get_not_running_pods so if someone calls this function it fails gracefully

rnetser · 2025-05-10T07:18:21Z

utilities/infra.py

            # pod that was spinned up in place of pod marked for deletion, reaches healthy state before end
            # of this check
-            if pod_instance.metadata.get("deletionTimestamp") or pod_instance.status.phase not in (
+            elif pod_instance.metadata.get("deletionTimestamp") or pod_instance.status.phase not in (


i think it would make sense to reverse the conditions - i.e first check pod's metadata and phase and only if does not meet that condition iterate over the containers.

utilities/infra.py

rnetser · 2025-05-10T07:20:12Z

utilities/infra.py

-            else:
-                current_check = 0
+            if sample:
+                LOGGER.info(f"All pods: {[pod.name for pod in sample]}")


does this log add extra valut -i.e pod names only?

Its good for debugging.
But yeah I guess we dont need that for 99% of the cases.

Deleting

hmeir · 2025-05-12T12:28:30Z

/verified

utilities/infra.py

rnetser · 2025-05-13T12:32:50Z

/approve

Signed-off-by: Harel Meir <hmeir@redhat.com>

hmeir · 2025-05-14T06:49:50Z

/build-and-push-container

ghost · 2025-05-14T06:50:24Z

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-834 published

hmeir · 2025-05-14T08:22:16Z

/verified

rnetser

/approve

rnetser · 2025-05-14T14:36:43Z

/lgtm

vsibirsk · 2025-05-18T08:45:43Z

/approve

vsibirsk · 2025-05-18T08:46:13Z

/retest build-container

ghost · 2025-05-18T09:05:59Z

Successfully removed PR tag: quay.io/openshift-cnv/openshift-virtualization-tests:pr-834.

ghost added the size/XS label Apr 27, 2025

ghost requested a review from dshchedr April 27, 2025 13:12

ghost mentioned this pull request Apr 27, 2025

Fix wait_for_running_pods flakyness - 834 #835

Closed

ghost added the branch-main label Apr 27, 2025

ghost requested review from RoniKishner, dbasunag, geetikakay and vsibirsk April 27, 2025 13:12

dbasunag1 added the wip label Apr 27, 2025

dbasunag1 changed the title ~~Fix wait_for_running_pods flakyness~~ WIP: Fix wait_for_running_pods flakyness Apr 27, 2025

polarion-jenkins previously approved these changes Apr 27, 2025

View reviewed changes

ghost added the lgtm-polarion-jenkins label Apr 27, 2025

ghost removed the wip label Apr 27, 2025

dbasunag1 added the verified label Apr 27, 2025

ghost changed the title ~~WIP: Fix wait_for_running_pods flakyness~~ Fix wait_for_running_pods flakyness Apr 27, 2025

Ahmad-Hafe reviewed Apr 28, 2025

View reviewed changes

utilities/infra.py Show resolved Hide resolved

ghost added the commented-Ahmad-Hafe label Apr 28, 2025

vsibirsk requested changes Apr 29, 2025

View reviewed changes

utilities/hco.py Outdated Show resolved Hide resolved

utilities/infra.py Show resolved Hide resolved

polarion-jenkins previously approved these changes May 8, 2025

View reviewed changes

dshchedr previously approved these changes May 9, 2025

View reviewed changes

rnetser requested changes May 10, 2025

View reviewed changes

polarion-jenkins approved these changes May 11, 2025

View reviewed changes

polarion-jenkins previously approved these changes May 12, 2025

View reviewed changes

polarion-jenkins previously approved these changes May 13, 2025

View reviewed changes

rnetser reviewed May 13, 2025

View reviewed changes

utilities/infra.py Outdated Show resolved Hide resolved

hmeir added 4 commits May 14, 2025 03:09

Fix wait_for_running_pods flakyness

ea967dd

Signed-off-by: Harel Meir <hmeir@redhat.com>

Replace the lambda with a pod.get() call inside the func

637b235

Signed-off-by: Harel Meir <hmeir@redhat.com>

Get non-running pods with each sampler call

18da242

Signed-off-by: Harel Meir <hmeir@redhat.com>

Wrap get_pod_container_error_status with try-except

e839b5c

Signed-off-by: Harel Meir <hmeir@redhat.com>

polarion-jenkins approved these changes May 14, 2025

View reviewed changes

rnetser reviewed May 14, 2025

View reviewed changes

vsibirsk approved these changes May 18, 2025

View reviewed changes

jpeimer approved these changes May 18, 2025

View reviewed changes

hmeir mentioned this pull request May 18, 2025

[4.18] Fix wait_for_running_pods flakyness #996

Merged

coderabbitai bot mentioned this pull request Aug 28, 2025

Move nmstate namespace check to network sanity checks and nodes_active_nics to return node_physical_nics for cloud clusters #1904

Merged

coderabbitai bot mentioned this pull request Jan 23, 2026

Refactor vgpu tests #3577

Merged

		@@ -275,30 +275,30 @@ def wait_for_pods_deletion(pods):

		def get_pod_container_error_status(pod):
		pod_instance_status = pod.instance.status

Conversation

hmeir commented Apr 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short description:

Which issue(s) this PR fixes:

Special notes for reviewer:

jira-ticket:

Summary by CodeRabbit

Uh oh!

hmeir commented Apr 27, 2025

Uh oh!

ghost commented Apr 27, 2025

Uh oh!

hmeir commented Apr 27, 2025

Uh oh!

dbasunag1 commented Apr 27, 2025

Uh oh!

hmeir commented Apr 27, 2025

Uh oh!

hmeir commented Apr 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hmeir commented May 8, 2025

Uh oh!

ghost commented May 8, 2025

Uh oh!

hmeir commented May 9, 2025

Uh oh!

rnetser May 10, 2025

Choose a reason for hiding this comment

Uh oh!

hmeir May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnetser May 12, 2025

Choose a reason for hiding this comment

Uh oh!

rnetser May 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rnetser May 10, 2025

Choose a reason for hiding this comment

Uh oh!

hmeir May 11, 2025

Choose a reason for hiding this comment

Uh oh!

hmeir commented May 12, 2025

Uh oh!

Uh oh!

rnetser commented May 13, 2025

Uh oh!

hmeir commented May 14, 2025

Uh oh!

ghost commented May 14, 2025

Uh oh!

hmeir commented May 14, 2025

Uh oh!

rnetser left a comment

Choose a reason for hiding this comment

Uh oh!

rnetser commented May 14, 2025

Uh oh!

vsibirsk commented May 18, 2025

Uh oh!

vsibirsk commented May 18, 2025

Uh oh!

ghost commented May 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

hmeir commented Apr 27, 2025 •

edited by coderabbitai bot

Loading

hmeir May 11, 2025 •

edited

Loading