feat: add option to run tests even if cluster sanity checks failed by lugi0 · Pull Request #289 · opendatahub-io/opendatahub-tests

lugi0 · 2025-05-05T15:32:08Z

Description

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

New Features
- Added command-line options to customize cluster sanity checks during test runs, including skipping checks, skipping specific resource checks, and continuing tests with warnings on failures.
Bug Fixes
- Fixed a typo in the help text for the upgrade deployment modes option.
Refactor
- Enhanced logging and error handling for cluster sanity checks to improve test execution feedback.

coderabbitai · 2025-05-05T15:32:15Z

Walkthrough

The changes introduce a new pytest command-line option, --cluster-sanity-continue-on-failure, allowing tests to continue even if cluster sanity checks fail, with appropriate warnings and JUnit XML property updates. The help text for the --upgrade-deployment-modes option is corrected from "Coma-separated" to "Comma-separated." The verify_cluster_sanity function is refactored to handle three cluster sanity-related options: skipping all checks, skipping only RHOAI-specific checks, and continuing on failure. Logging and JUnit XML reporting are enhanced to reflect these behaviors, while the default behavior of aborting on failure remains unchanged.

Changes

File(s)	Change Summary
conftest.py	Fixed typo in help text for `--upgrade-deployment-modes`. Added new CLI option `--cluster-sanity-continue-on-failure` to pytest.
utilities/infra.py	Refactored `verify_cluster_sanity` to handle new CLI options: `--cluster-sanity-skip-check`, `--cluster-sanity-skip-rhoai-check`, and `--cluster-sanity-continue-on-failure`. Enhanced logging, error handling, and JUnit XML reporting to support skipping and continuation behaviors on cluster sanity check failures.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Pytest
    participant verify_cluster_sanity
    participant Logger
    participant JUnitXML

    User->>Pytest: Run tests with CLI options
    Pytest->>verify_cluster_sanity: Call with options
    verify_cluster_sanity->>Logger: Log check start
    alt --cluster-sanity-skip-check
        verify_cluster_sanity->>Logger: Log skipping all checks
        verify_cluster_sanity->>JUnitXML: Set skipped property
        verify_cluster_sanity-->>Pytest: Return
    else --cluster-sanity-skip-rhoai-check
        verify_cluster_sanity->>Logger: Log skipping RHOAI checks
        verify_cluster_sanity->>JUnitXML: Set skipped property
        verify_cluster_sanity->>Logger: Run node health and schedulability checks
    else Checks run
        verify_cluster_sanity->>Logger: Run all checks including RHOAI
    end
    alt Checks fail
        alt --cluster-sanity-continue-on-failure
            verify_cluster_sanity->>Logger: Log warning, continue tests
            verify_cluster_sanity->>JUnitXML: Set failure with continue property
            verify_cluster_sanity-->>Pytest: Return
        else Default behavior
            verify_cluster_sanity->>Logger: Log error, exit tests
            verify_cluster_sanity->>JUnitXML: Set failure with exit code 99
            verify_cluster_sanity->>Pytest: Exit test run
        end
    else Checks pass
        verify_cluster_sanity->>Logger: Log success
        verify_cluster_sanity-->>Pytest: Return
    end

Poem

A rabbit hopped through code so neat,
Tweaked a typo, made it sweet.
Cluster checks now heed your say—
Skip or warn, or halt the day.
With options new, the tests can flow,
As carrots in the garden grow! 🥕

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aba9503 and e6c1d83.

📒 Files selected for processing (1)

utilities/infra.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

utilities/infra.py

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

utilities/infra.py (1)
914-931: Well-implemented continue-on-failure behavior.

The implementation correctly handles the different behaviors based on the --cluster-sanity-continue-on-failure flag, with appropriate logging and JUnit XML property updates.

There's a TODO comment at line 930. Consider either implementing the file output for Jenkins reporting or creating a separate ticket to track this item:
- # TODO: Write to file to easily report the failure in jenkins
+ # TODO: Write to file to easily report the failure in jenkins (tracked in separate ticket #XXX)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eda4aeb and aba9503.

📒 Files selected for processing (2)

conftest.py (2 hunks)
utilities/infra.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

utilities/infra.py (2)

tests/conftest.py (3)

nodes (496-497)

dsci_resource (334-335)

dsc_resource (339-340)

utilities/exceptions.py (1)

ResourceNotReadyError (99-100)

🔇 Additional comments (8)

conftest.py (2)

112-112: Fixed typo in help text.

Corrected "Coma-separated" to "Comma-separated" in the help text.

133-139: Well-structured new CLI option for continuing tests on cluster sanity failure.

This new flag provides a clean way to optionally continue test execution even when cluster sanity checks fail, enhancing flexibility for test scenarios while maintaining the default behavior of failing fast.

utilities/infra.py (6)

874-877: Clear documentation of the new behavior.

The updated docstring accurately describes the cluster sanity check behavior and the effect of the new flag.

887-892: Good use of constants for CLI options.

Using constants for the command-line options improves code readability and maintainability, reducing the risk of typos when referencing these options throughout the code.

894-897: Clean implementation of the skip-check functionality.

The code correctly handles the case when --cluster-sanity-skip-check is specified, logging a warning and returning early to skip all checks.

900-900: Added helpful log message to indicate check start.

Adding this log message improves debuggability by clearly showing when the cluster sanity check begins.

903-905: Good implementation of selective RHOAI check skipping.

The code effectively implements the ability to skip only RHOAI-specific resource checks while still performing the basic node checks.

908-909: Added success log message.

This log message helps with debugging by clearly indicating when the sanity check has successfully completed.

github-actions · 2025-05-05T15:34:03Z

The following are automatically added/executed:

PR size label.
Run pre-commit
Run tox
Add PR author as the PR assignee

Available user actions:

To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
lgtm label removed on each new commit push.
To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
verified label removed on each new commit push.
To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.

Supported labels

{'/verified', '/lgtm', '/wip', '/hold'}

dbasunag · 2025-05-05T15:36:40Z

utilities/infra.py

    try:
-        LOGGER.info("Check cluster sanity.")
-
+        LOGGER.info("Running cluster sanity check...")


I think we should not skip here. If the nodes are not schedulable or healthy. It is a cluster problem.

This is different from what was being discussed in slack. I do agree with you, but can we get an ack to proceed before implementing the change?

utilities/infra.py

dbasunag · 2025-05-05T15:39:36Z

utilities/infra.py

+            if junitxml_property:
+                junitxml_property(name="cluster_sanity_check_failed", value=True)  # type: ignore[call-arg]
+                junitxml_property(name="cluster_sanity_forced_continue", value=True)  # type: ignore[call-arg]
+            # Return normally from the fixture setup


I believe this is for test result reporting. If cluster sanity fails and we are continuing with tests, we should not mark the tests as failure because of sanity failure. They should be marked failed only if the test fails.

This is only marking those two specific properties, not the tests as failures

Then the associated test would be marked as failed though (e.g. first test of model registry). Do you want that?

lugi0 self-assigned this May 5, 2025

lugi0 requested a review from a team as a code owner May 5, 2025 15:32

coderabbitai bot reviewed May 5, 2025

View reviewed changes

github-actions bot added Utilities size/m labels May 5, 2025

dbasunag requested changes May 5, 2025

View reviewed changes

opendatahub-tests-bot added the changes-requested-by-dbasunag label May 5, 2025

lugi0 requested a review from dbasunag May 5, 2025 15:58

opendatahub-tests-bot added commented-by-lugi0 commented-by-dbasunag and removed changes-requested-by-dbasunag commented-by-lugi0 commented-by-dbasunag labels May 5, 2025

lugi0 closed this May 15, 2025

lugi0 force-pushed the main branch from f1d3546 to 5dbfc8d Compare May 15, 2025 10:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add option to run tests even if cluster sanity checks failed#289

feat: add option to run tests even if cluster sanity checks failed#289
lugi0 wants to merge 0 commit intoopendatahub-io:mainfrom
lugi0:main

lugi0 commented May 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented May 5, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented May 5, 2025

Uh oh!

dbasunag May 5, 2025

Uh oh!

lugi0 May 5, 2025

Uh oh!

Uh oh!

dbasunag May 5, 2025

Uh oh!

lugi0 May 5, 2025

Uh oh!

dbasunag May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lugi0 commented May 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 5, 2025

Uh oh!

dbasunag May 5, 2025

Choose a reason for hiding this comment

Uh oh!

lugi0 May 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dbasunag May 5, 2025

Choose a reason for hiding this comment

Uh oh!

lugi0 May 5, 2025

Choose a reason for hiding this comment

Uh oh!

dbasunag May 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lugi0 commented May 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 5, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)