Skip to content

Wait for MC pod before api call#580

Merged
dbasunag merged 2 commits intoopendatahub-io:mainfrom
dbasunag:wait_for_mc_pod
Sep 4, 2025
Merged

Wait for MC pod before api call#580
dbasunag merged 2 commits intoopendatahub-io:mainfrom
dbasunag:wait_for_mc_pod

Conversation

@dbasunag
Copy link
Copy Markdown
Collaborator

@dbasunag dbasunag commented Sep 3, 2025

Between tests we have to kill pods. And we should not try the apis when the pods are not in running state.

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Tests
    • Improved test stability by adding an extra readiness wait for the model registry namespace after setup.
    • Increased consecutive readiness checks to reduce flakiness during environment synchronization.
    • Replaced iterative polling with a more reliable two-step readiness verification (confirm pod creation, then wait for pods to reach running state) and added retries for missing pods.

@dbasunag dbasunag requested review from fege and lugi0 as code owners September 3, 2025 15:12
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 3, 2025

📝 Walkthrough

Walkthrough

Adds a wait_for_pods_running synchronization for the model registry namespace in test setup. Updates model catalog utilities: imports wait_for_pods_running, increases default consecutive checks to 6, replaces a polling loop with a pod-creation check plus RUNNING-state wait, renames wait_for_model_catalog_update to wait_for_model_catalog_pod_created.

Changes

Cohort / File(s) Summary of changes
Test setup synchronization
tests/model_registry/conftest.py
After applying the patch and creating the namespace, adds a wait_for_pods_running call for the model_registry_namespace with 6 consecutive checks, in addition to the existing wait for the applications namespace.
Model catalog readiness utilities
tests/model_registry/model_catalog/utils.py
Adds import of wait_for_pods_running; increases is_model_catalog_ready default consecutive_try from 3 → 6; replaces per-iteration polling loop with wait_for_model_catalog_pod_created(...) then wait_for_pods_running(...); renames wait_for_model_catalog_updatewait_for_model_catalog_pod_created (signature now returns bool); introduces PodNotFound(Exception) and decorates the new function with a retry policy.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 3, 2025

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/hold', '/lgtm', '/cherry-pick', '/build-push-pr-image', '/wip', '/verified'}

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/model_registry/model_catalog/conftest.py (2)

12-12: Prefer targeted readiness check over namespace-wide pod wait

wait_for_pods_running waits for all pods in the namespace, which can add flakiness/latency unrelated to Model Catalog. Since is_model_catalog_ready is already imported and scoped to MC, use it instead and drop this import.

Apply:

-from tests.model_registry.utils import wait_for_pods_running

32-35: Use MC-scoped readiness check; class-scoped wait may miss inter-test pod churn

The current wait targets all namespace pods; switch to MC-specific readiness for precision. Also, because this fixture is class-scoped, if pods are torn down between test methods, the once-per-class wait may be insufficient—consider a function-scoped readiness helper for tests that issue API calls.

Apply:

-    # ensure that model catalog pod is up and running:
-    wait_for_pods_running(
-        admin_client=admin_client, namespace_name=model_registry_namespace, number_of_consecutive_checks=3
-    )
+    # Ensure Model Catalog pods are ready before constructing URLs
+    is_model_catalog_ready(client=admin_client, model_registry_namespace=model_registry_namespace)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a49ee47 and f976fa3.

📒 Files selected for processing (1)
  • tests/model_registry/model_catalog/conftest.py (2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#487
File: tests/model_registry/async_job/utils.py:14-32
Timestamp: 2025-08-08T16:00:51.421Z
Learning: In opendatahub-io/opendatahub-tests, for tests under tests/model_registry/async_job, keep get_latest_job_pod (tests/model_registry/async_job/utils.py) simple: sorting by creationTimestamp string and selecting pods via label_selector="job-name=<job.name>" is acceptable; do not add datetime parsing or controller-uid filtering unless a concrete failure arises (per guidance from user lugi0).
📚 Learning: 2025-07-17T15:42:26.275Z
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:0-0
Timestamp: 2025-07-17T15:42:26.275Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 fixture (and similar duplicated Model Registry instance fixtures) do not require admin_client, db_deployment_1, or db_secret_1 parameters as explicit dependencies, even though these dependencies exist implicitly through the fixture dependency chain.

Applied to files:

  • tests/model_registry/model_catalog/conftest.py
📚 Learning: 2025-07-17T15:42:04.167Z
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:595-612
Timestamp: 2025-07-17T15:42:04.167Z
Learning: In tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fixtures) do not require the admin_client parameter or explicit dependencies on related fixtures like db_secret_1, db_pvc_1, and db_service_1, even though the original model_registry_db_deployment fixture includes these parameters.

Applied to files:

  • tests/model_registry/model_catalog/conftest.py
📚 Learning: 2025-07-17T15:42:23.880Z
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:733-770
Timestamp: 2025-07-17T15:42:23.880Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fixtures do not need explicit database dependency fixtures (like db_deployment_1, db_secret_1, etc.) in their function signatures. Pytest's dependency injection automatically handles the fixture dependencies when they reference db_name_1 and db_name_2 parameters. This is the correct pattern for these Model Registry instance fixtures.

Applied to files:

  • tests/model_registry/model_catalog/conftest.py
📚 Learning: 2025-08-21T19:08:14.982Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#527
File: tests/model_registry/rbac/conftest.py:358-359
Timestamp: 2025-08-21T19:08:14.982Z
Learning: In tests/model_registry/rbac/conftest.py, the model catalog configmap only exists in one specific namespace (py_config["model_registry_namespace"]) regardless of where model registry instances are deployed, making it a shared resource that serves all model registry instances across namespaces.

Applied to files:

  • tests/model_registry/model_catalog/conftest.py
📚 Learning: 2025-07-17T15:43:04.876Z
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:579-591
Timestamp: 2025-07-17T15:43:04.876Z
Learning: In tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the admin_client parameter for Service resource creation, despite the existing model_registry_db_service fixture using client=admin_client. This inconsistency was confirmed as intentional by user lugi0.

Applied to files:

  • tests/model_registry/model_catalog/conftest.py
🧬 Code graph analysis (1)
tests/model_registry/model_catalog/conftest.py (3)
tests/model_registry/utils.py (1)
  • wait_for_pods_running (249-282)
tests/conftest.py (1)
  • admin_client (68-69)
tests/model_registry/conftest.py (1)
  • model_registry_namespace (63-64)
🔇 Additional comments (1)
tests/model_registry/model_catalog/conftest.py (1)

28-30: Signature change approved — no external imports or direct calls of model_catalog_rest_url found.

sheltoncyril
sheltoncyril previously approved these changes Sep 3, 2025
@sheltoncyril
Copy link
Copy Markdown
Contributor

/lgtm

rhods-ci-bot
rhods-ci-bot previously approved these changes Sep 3, 2025
sheltoncyril
sheltoncyril previously approved these changes Sep 3, 2025
Copy link
Copy Markdown
Contributor

@sheltoncyril sheltoncyril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

sheltoncyril
sheltoncyril previously approved these changes Sep 3, 2025
Copy link
Copy Markdown
Contributor

@sheltoncyril sheltoncyril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/model_registry/model_catalog/utils.py (2)

57-57: Make pod deletion deterministic

Deletion is async; add wait=True so the follow-up creation check doesn’t race the terminating pod.

Apply:

-        pod.delete()
+        pod.delete(wait=True)

59-62: Optional: wait specifically for MC pods, not the whole namespace

To avoid coupling readiness to unrelated pods in the namespace, consider waiting by label (component=model-catalog) instead of namespace-wide. This can reduce flakiness and shorten waits.

If you choose this, import the helper and switch the call:

- from tests.model_registry.utils import get_model_catalog_pod, wait_for_pods_running
+ from tests.model_registry.utils import get_model_catalog_pod, wait_for_pods_running
+ from utilities.general import wait_for_pods_by_labels
-    wait_for_pods_running(
-        admin_client=client, namespace_name=model_registry_namespace, number_of_consecutive_checks=consecutive_try
-    )
+    wait_for_pods_by_labels(
+        admin_client=client,
+        namespace=model_registry_namespace,
+        label_selector="component=model-catalog",
+        expected_num_pods=1,
+    )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f976fa3 and c0f0b31.

📒 Files selected for processing (2)
  • tests/model_registry/conftest.py (1 hunks)
  • tests/model_registry/model_catalog/utils.py (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: fege
PR: opendatahub-io/opendatahub-tests#409
File: tests/model_registry/negative_tests/test_db_migration.py:47-57
Timestamp: 2025-07-09T08:50:22.172Z
Learning: In negative tests for model registry database migration failures (like test_db_migration_negative), the model registry pod is expected to be in a failed state (not Running) when the database is dirty. The test fetches logs from the failed pod to verify the expected error message about dirty database version, so waiting for Running status would be inappropriate.
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#527
File: tests/model_registry/rbac/conftest.py:358-359
Timestamp: 2025-08-21T19:08:14.982Z
Learning: In tests/model_registry/rbac/conftest.py, the model catalog configmap only exists in one specific namespace (py_config["model_registry_namespace"]) regardless of where model registry instances are deployed, making it a shared resource that serves all model registry instances across namespaces.
📚 Learning: 2025-08-11T12:14:34.102Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#504
File: tests/model_registry/upgrade/conftest.py:20-30
Timestamp: 2025-08-11T12:14:34.102Z
Learning: In tests/model_registry/upgrade/conftest.py, when using ResourceEditor to patch DataScienceCluster resources with a partial components dictionary like `{DscComponents.MODELREGISTRY: {"managementState": ...}}`, the patch is applied correctly without overwriting other components in spec.components. ResourceEditor performs the appropriate merge operation for this resource type.

Applied to files:

  • tests/model_registry/conftest.py
📚 Learning: 2025-08-21T19:08:14.982Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#527
File: tests/model_registry/rbac/conftest.py:358-359
Timestamp: 2025-08-21T19:08:14.982Z
Learning: In tests/model_registry/rbac/conftest.py, the model catalog configmap only exists in one specific namespace (py_config["model_registry_namespace"]) regardless of where model registry instances are deployed, making it a shared resource that serves all model registry instances across namespaces.

Applied to files:

  • tests/model_registry/model_catalog/utils.py
📚 Learning: 2025-07-09T08:50:22.172Z
Learnt from: fege
PR: opendatahub-io/opendatahub-tests#409
File: tests/model_registry/negative_tests/test_db_migration.py:47-57
Timestamp: 2025-07-09T08:50:22.172Z
Learning: In negative tests for model registry database migration failures (like test_db_migration_negative), the model registry pod is expected to be in a failed state (not Running) when the database is dirty. The test fetches logs from the failed pod to verify the expected error message about dirty database version, so waiting for Running status would be inappropriate.

Applied to files:

  • tests/model_registry/model_catalog/utils.py
🧬 Code graph analysis (2)
tests/model_registry/conftest.py (1)
tests/model_registry/utils.py (1)
  • wait_for_pods_running (249-282)
tests/model_registry/model_catalog/utils.py (1)
tests/model_registry/utils.py (2)
  • get_model_catalog_pod (728-731)
  • wait_for_pods_running (249-282)
🔇 Additional comments (3)
tests/model_registry/model_catalog/utils.py (3)

18-18: Import addition looks good

Importing wait_for_pods_running aligns with the new readiness flow.


52-52: Bumping consecutive_try to 6 is reasonable

Matches the stabilization approach used elsewhere in this PR.


66-71: Raise a retriable exception to trigger @Retry and extend timeout
The retry decorator won’t retry on a False return, so change the function to raise ResourceNotFoundError when no pod is found and increase wait_timeout to 60 s:

-@retry(wait_timeout=30, sleep=5)
-def wait_for_model_catalog_pod_created(client: DynamicClient, model_registry_namespace: str) -> bool:
-    pods = get_model_catalog_pod(client=client, model_registry_namespace=model_registry_namespace)
-    if pods:
-        return True
-    return False
+@retry(wait_timeout=60, sleep=5, exceptions_dict={ResourceNotFoundError: []})
+def wait_for_model_catalog_pod_created(client: DynamicClient, model_registry_namespace: str) -> bool:
+    pods = get_model_catalog_pod(client=client, model_registry_namespace=model_registry_namespace)
+    if not pods:
+        raise ResourceNotFoundError("model-catalog pod not created yet")
+    return True

Verify that your retry decorator supports the exceptions_dict parameter and will retry on ResourceNotFoundError.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tests/model_registry/model_catalog/utils.py (3)

52-52: Add explicit return annotation for clarity

This helper doesn’t return a value; add -> None.

-def is_model_catalog_ready(client: DynamicClient, model_registry_namespace: str, consecutive_try: int = 6):
+def is_model_catalog_ready(client: DynamicClient, model_registry_namespace: str, consecutive_try: int = 6) -> None:

65-69: Scope exception to MC pods or enrich context

Consider either renaming to ModelCatalogPodNotFound or ensure raised messages include namespace and label to aid triage.


30-33: Retry on transient network errors too

ConnectionRefused/Timeout from requests won’t be retried. Include RequestException to make wait_for_model_catalog_api robust while MC starts.

-@retry(wait_timeout=60, sleep=5, exceptions_dict={ResourceNotFoundError: []})
+@retry(wait_timeout=60, sleep=5, exceptions_dict={ResourceNotFoundError: [], requests.exceptions.RequestException: []})
 def wait_for_model_catalog_api(url: str, headers: dict[str, str], verify: bool | str = False) -> requests.Response:
     return _execute_get_call(url=f"{url}sources", headers=headers, verify=verify)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c0f0b31 and a9afd4a.

📒 Files selected for processing (2)
  • tests/model_registry/conftest.py (1 hunks)
  • tests/model_registry/model_catalog/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/model_registry/conftest.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: fege
PR: opendatahub-io/opendatahub-tests#409
File: tests/model_registry/negative_tests/test_db_migration.py:47-57
Timestamp: 2025-07-09T08:50:22.172Z
Learning: In negative tests for model registry database migration failures (like test_db_migration_negative), the model registry pod is expected to be in a failed state (not Running) when the database is dirty. The test fetches logs from the failed pod to verify the expected error message about dirty database version, so waiting for Running status would be inappropriate.
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#527
File: tests/model_registry/rbac/conftest.py:358-359
Timestamp: 2025-08-21T19:08:14.982Z
Learning: In tests/model_registry/rbac/conftest.py, the model catalog configmap only exists in one specific namespace (py_config["model_registry_namespace"]) regardless of where model registry instances are deployed, making it a shared resource that serves all model registry instances across namespaces.
📚 Learning: 2025-07-09T08:50:22.172Z
Learnt from: fege
PR: opendatahub-io/opendatahub-tests#409
File: tests/model_registry/negative_tests/test_db_migration.py:47-57
Timestamp: 2025-07-09T08:50:22.172Z
Learning: In negative tests for model registry database migration failures (like test_db_migration_negative), the model registry pod is expected to be in a failed state (not Running) when the database is dirty. The test fetches logs from the failed pod to verify the expected error message about dirty database version, so waiting for Running status would be inappropriate.

Applied to files:

  • tests/model_registry/model_catalog/utils.py
📚 Learning: 2025-08-21T19:08:14.982Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#527
File: tests/model_registry/rbac/conftest.py:358-359
Timestamp: 2025-08-21T19:08:14.982Z
Learning: In tests/model_registry/rbac/conftest.py, the model catalog configmap only exists in one specific namespace (py_config["model_registry_namespace"]) regardless of where model registry instances are deployed, making it a shared resource that serves all model registry instances across namespaces.

Applied to files:

  • tests/model_registry/model_catalog/utils.py
🧬 Code graph analysis (1)
tests/model_registry/model_catalog/utils.py (1)
tests/model_registry/utils.py (2)
  • get_model_catalog_pod (728-731)
  • wait_for_pods_running (249-282)
🔇 Additional comments (2)
tests/model_registry/model_catalog/utils.py (2)

18-18: LGTM: import of wait_for_pods_running

Import is used below; no issues.


52-63: Negative tests don’t use Running-state helpers
Verified no calls to is_model_catalog_ready or wait_for_pods_running in tests/model_registry/negative_tests; negative DB migration tests won’t hang waiting for pods.

Copy link
Copy Markdown
Contributor

@fege fege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@dbasunag dbasunag merged commit dcff182 into opendatahub-io:main Sep 4, 2025
9 checks passed
@dbasunag dbasunag deleted the wait_for_mc_pod branch September 4, 2025 13:56
@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 4, 2025

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

mwaykole pushed a commit to mwaykole/opendatahub-tests that referenced this pull request Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants