Skip to content

feat: Separate out cluster, operator and component health checks#1133

Merged
dbasunag merged 4 commits intoopendatahub-io:mainfrom
dbasunag:cluster_health_main
Feb 24, 2026
Merged

feat: Separate out cluster, operator and component health checks#1133
dbasunag merged 4 commits intoopendatahub-io:mainfrom
dbasunag:cluster_health_main

Conversation

@dbasunag
Copy link
Copy Markdown
Collaborator

@dbasunag dbasunag commented Feb 19, 2026

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Tests

    • Added new test markers for better health check categorization across operators and components
    • Reorganized cluster health test suite for improved test structure and coverage refinement
  • Chores

    • Updated test framework configuration to support enhanced test organization

@dbasunag dbasunag requested review from a team, fege and lugi0 as code owners February 19, 2026 22:21
@github-actions
Copy link
Copy Markdown

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/hold', '/wip', '/verified', '/cherry-pick', '/build-push-pr-image', '/lgtm'}

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 19, 2026

📝 Walkthrough

Walkthrough

Tests are reorganized to separate cluster health checks from operator-specific health checks. Two new pytest markers (operator_health and component_health) are introduced in configuration. Several health check tests are moved from the general cluster health test module to a new operator health module, and component health test markers are updated accordingly. Type annotations are added to test methods.

Changes

Cohort / File(s) Summary
Pytest Configuration
pytest.ini
Added two new pytest markers: operator_health for OpenDataHub/RHOAI operator health checks and component_health for component health checks.
Cluster Health Test Reorganization
tests/cluster_health/test_cluster_health.py, tests/cluster_health/test_operator_health.py
Moved three health check tests (test_data_science_cluster_initialization_healthy, test_data_science_cluster_healthy, test_pods_cluster_healthy) from test_cluster_health.py to new file test_operator_health.py. Updated test_cluster_node_healthy signature with return type annotation and docstring in original file.
Component Health Tests
tests/model_registry/component_health/test_mr_health_check.py
Updated pytest marker from @pytest.mark.cluster_health to @pytest.mark.component_health. Added explicit -> None return type annotations to four test methods (test_mr_management_state, test_mr_namespace_exists_and_active, test_mr_condition_in_dsc, test_mr_pods_health).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the changeset: separating health checks into three distinct categories (cluster, operator, and component) across multiple test files and configuration.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/model_registry/ai_hub_health/test_mr_health_check.py (1)

44-48: ⚠️ Potential issue | 🟡 Minor

test_mr_pods_health is missing a mandatory docstring.

Per coding guidelines, every test MUST have a docstring explaining what it tests, using Given-When-Then format. Also, the -> None return type annotation is missing (mypy strict).

📝 Proposed fix
     `@pytest.mark.component_health`
-    def test_mr_pods_health(self, admin_client: DynamicClient):
+    def test_mr_pods_health(self, admin_client: DynamicClient) -> None:
+        """Verify Model Registry pods are healthy.
+
+        Given: A running OpenDataHub/RHOAI instance with Model Registry enabled.
+        When: Querying all pods in the model registry namespace.
+        Then: All pods should be in Running/Completed state.
+        """
         namespace = py_config["model_registry_namespace"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_registry/ai_hub_health/test_mr_health_check.py` around lines 44 -
48, Add a mandatory docstring and return type annotation to the
test_mr_pods_health function: update the function signature to include "-> None"
and add a triple-quoted docstring above the body following Given-When-Then that
states the initial state (Given the model registry namespace from py_config),
the action (When we check pods via wait_for_pods_running with admin_client), and
the expected outcome (Then all pods are in Running state), keeping references to
test_mr_pods_health, admin_client, py_config, and wait_for_pods_running so the
purpose and intent are clear.
🧹 Nitpick comments (1)
tests/model_registry/ai_hub_health/test_mr_health_check.py (1)

44-44: Redundant @pytest.mark.component_health on test_mr_pods_health.

The class TestMrDefault is already decorated with @pytest.mark.component_health at Line 16, which propagates to all methods. The method-level marker at Line 44 is a no-op.

♻️ Proposed fix
-    `@pytest.mark.component_health`
     def test_mr_pods_health(self, admin_client: DynamicClient):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_registry/ai_hub_health/test_mr_health_check.py` at line 44, The
method-level pytest.mark.component_health on test_mr_pods_health is redundant
because the TestMrDefault class is already decorated with
`@pytest.mark.component_health`; remove the duplicate marker on the test method
`test_mr_pods_health` so the class-level marker applies to all methods and
avoids noisy/no-op decorators.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/cluster_health/test_operator_health.py`:
- Around line 13-20: Add missing docstrings and explicit return type annotations
to the test functions (e.g., test_data_science_cluster_initialization_healthy
and test_data_science_cluster_healthy and any other test_* functions in this
file); each test must have a triple-quoted docstring in Given-When-Then format
describing the preconditions, action, and expected outcome, and the function
signature must include "-> None". Update the function definitions and ensure the
docstrings are placed immediately below the def line for linters and mypy strict
compliance.
- Around line 37-39: The test currently ignores the return value of
wait_for_pods_running (typed bool | None) so a None (e.g., from swallowed
TimeoutExpiredError) will silently pass; update test_pods_cluster_healthy to
capture the result from wait_for_pods_running and assert it is True (or
explicitly fail on None) so the test fails when pods aren't confirmed running —
reference wait_for_pods_running and test_pods_cluster_healthy and ensure the
assertion includes a clear message (e.g., "Pods not running" or
"wait_for_pods_running returned None/False").

---

Outside diff comments:
In `@tests/model_registry/ai_hub_health/test_mr_health_check.py`:
- Around line 44-48: Add a mandatory docstring and return type annotation to the
test_mr_pods_health function: update the function signature to include "-> None"
and add a triple-quoted docstring above the body following Given-When-Then that
states the initial state (Given the model registry namespace from py_config),
the action (When we check pods via wait_for_pods_running with admin_client), and
the expected outcome (Then all pods are in Running state), keeping references to
test_mr_pods_health, admin_client, py_config, and wait_for_pods_running so the
purpose and intent are clear.

---

Nitpick comments:
In `@tests/model_registry/ai_hub_health/test_mr_health_check.py`:
- Line 44: The method-level pytest.mark.component_health on test_mr_pods_health
is redundant because the TestMrDefault class is already decorated with
`@pytest.mark.component_health`; remove the duplicate marker on the test method
`test_mr_pods_health` so the class-level marker applies to all methods and
avoids noisy/no-op decorators.

Comment thread tests/cluster_health/test_operator_health.py
Comment thread tests/cluster_health/test_operator_health.py Outdated
Comment thread tests/model_registry/component_health/test_mr_health_check.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/model_registry/component_health/test_mr_health_check.py (1)

46-50: ⚠️ Potential issue | 🟡 Minor

test_mr_pods_health has four issues introduced or exposed by this change

  1. Redundant marker (line 46): @pytest.mark.component_health is already applied at the class level (line 16); the method-level decorator is a no-op and should be removed.
  2. Missing -> None (line 47): Every other test method in this class was updated to include an explicit return type in this PR; test_mr_pods_health was missed.
  3. Missing docstring: The coding guideline requires every test to have a docstring; this method has none.
  4. Stale log message (line 49): "for cluster health" should read "for component health" after the rename.
🔧 Proposed fix
-    `@pytest.mark.component_health`
-    def test_mr_pods_health(self, admin_client: DynamicClient):
+    def test_mr_pods_health(self, admin_client: DynamicClient) -> None:
+        """
+        Given a DynamicClient,
+        When the ModelRegistry pods are retrieved from the configured namespace,
+        Then all pods should be in Running state.
+        """
         namespace = py_config["model_registry_namespace"]
-        LOGGER.info(f"Testing Pods in namespace {namespace} for cluster health")
+        LOGGER.info(f"Testing Pods in namespace {namespace} for component health")
         wait_for_pods_running(admin_client=admin_client, namespace_name=namespace)

As per coding guidelines: "tests/**/*.py: Every test MUST have a docstring explaining what it tests" and "Add type annotations to test code and fixtures (mypy strict enforced)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_registry/component_health/test_mr_health_check.py` around lines
46 - 50, Remove the redundant method-level pytest marker by deleting the
`@pytest.mark.component_health` decorator above test_mr_pods_health, add an
explicit return type annotation `-> None` to the `def test_mr_pods_health`
signature, add a short docstring at the start of `test_mr_pods_health`
describing what this test verifies (e.g., that pods in the model registry
namespace reach Running), and update the LOGGER.info message (the string passed
to LOGGER.info) to say "for component health" instead of "for cluster health";
the relevant symbols are the `test_mr_pods_health` function, the LOGGER.info
call, and the existing call to `wait_for_pods_running`.
🧹 Nitpick comments (1)
tests/model_registry/component_health/test_mr_health_check.py (1)

19-19: Docstrings don't follow Given-When-Then format

All three method docstrings (test_mr_management_state, test_mr_namespace_exists_and_active, test_mr_condition_in_dsc) are plain one-line descriptions. Since the signatures were updated in this PR, updating the docstrings to the required format is a natural fit.

Example for test_mr_management_state:

"""
Given a DataScienceCluster resource,
When the ModelRegistry component is configured,
Then its managementState should be MANAGED.
"""

As per coding guidelines: "tests/**/*.py: Use Given-When-Then format in test docstrings for behavioral clarity" and "Write Google-format docstrings for tests and fixtures."

Also applies to: 28-28, 38-38

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_registry/component_health/test_mr_health_check.py` at line 19,
Update the one-line docstrings for the test functions test_mr_management_state,
test_mr_namespace_exists_and_active, and test_mr_condition_in_dsc to use
Given-When-Then Google-style test docstrings: for each function replace the
single-line description with a three-line docstring that starts with "Given ..."
describing the precondition (DataScienceCluster resource or component state),
"When ..." describing the action/trigger (ModelRegistry configured or
inspected), and "Then ..." describing the expected outcome (managementState is
MANAGED, namespace exists and is active, specific condition present in DSC),
ensuring each docstring follows the Google-format multi-line convention used
across tests/**/*.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/cluster_health/test_cluster_health.py`:
- Around line 10-13: The test_cluster_node_healthy docstring is a plain
description; update it to a Given-When-Then formatted docstring describing the
precondition (Given a list of cluster nodes), the action (When the health check
is performed), and the expectation (Then all nodes should be reported healthy
and any failing assertions made). Edit the docstring in the function
test_cluster_node_healthy to follow that structure and mention the behavior
being asserted so reviewers and readers can quickly see the test's intent.

---

Outside diff comments:
In `@tests/model_registry/component_health/test_mr_health_check.py`:
- Around line 46-50: Remove the redundant method-level pytest marker by deleting
the `@pytest.mark.component_health` decorator above test_mr_pods_health, add an
explicit return type annotation `-> None` to the `def test_mr_pods_health`
signature, add a short docstring at the start of `test_mr_pods_health`
describing what this test verifies (e.g., that pods in the model registry
namespace reach Running), and update the LOGGER.info message (the string passed
to LOGGER.info) to say "for component health" instead of "for cluster health";
the relevant symbols are the `test_mr_pods_health` function, the LOGGER.info
call, and the existing call to `wait_for_pods_running`.

---

Nitpick comments:
In `@tests/model_registry/component_health/test_mr_health_check.py`:
- Line 19: Update the one-line docstrings for the test functions
test_mr_management_state, test_mr_namespace_exists_and_active, and
test_mr_condition_in_dsc to use Given-When-Then Google-style test docstrings:
for each function replace the single-line description with a three-line
docstring that starts with "Given ..." describing the precondition
(DataScienceCluster resource or component state), "When ..." describing the
action/trigger (ModelRegistry configured or inspected), and "Then ..."
describing the expected outcome (managementState is MANAGED, namespace exists
and is active, specific condition present in DSC), ensuring each docstring
follows the Google-format multi-line convention used across tests/**/*.py.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 345a47d and 9b53c92.

📒 Files selected for processing (6)
  • tests/cluster_health/test_cluster_health.py
  • tests/cluster_health/test_operator_health.py
  • tests/model_registry/component_health/__init__.py
  • tests/model_registry/component_health/conftest.py
  • tests/model_registry/component_health/test_mr_health_check.py
  • tests/model_registry/component_health/test_mr_operator_health.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/cluster_health/test_operator_health.py

Comment thread tests/cluster_health/test_cluster_health.py
Comment thread tests/model_registry/component_health/test_mr_health_check.py
@dbasunag dbasunag disabled auto-merge February 24, 2026 11:59
@dbasunag dbasunag merged commit a732785 into opendatahub-io:main Feb 24, 2026
9 checks passed
@dbasunag dbasunag deleted the cluster_health_main branch February 24, 2026 11:59
@github-actions
Copy link
Copy Markdown

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

@dbasunag
Copy link
Copy Markdown
Collaborator Author

/cherry-pick 3.3

@rhods-ci-bot
Copy link
Copy Markdown
Contributor

Error cherry-picking.

Auto-merging pytest.ini
Auto-merging tests/model_registry/component_health/test_mr_health_check.py
CONFLICT (content): Merge conflict in tests/model_registry/component_health/test_mr_health_check.py
error: could not apply a732785... feat: Separate out cluster, operator and component health checks (#1133)
hint: After resolving the conflicts, mark them with
hint: "git add/rm ", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"

@rhods-ci-bot
Copy link
Copy Markdown
Contributor

‼️ cherry pick action failed.
See: https://github.com/opendatahub-io/opendatahub-tests/actions/runs/22375322705

dbasunag added a commit to dbasunag/opendatahub-tests that referenced this pull request Feb 25, 2026
…ndatahub-io#1133)

* feat: Separate out cluster, operator and component health checks

* fix: Addressed review comments
dbasunag added a commit that referenced this pull request Feb 25, 2026
…) (#1142)

* feat: Separate out cluster, operator and component health checks

* fix: Addressed review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants