feat: add option to run tests even if cluster sanity checks failed#289
feat: add option to run tests even if cluster sanity checks failed#289lugi0 wants to merge 0 commit intoopendatahub-io:mainfrom
Conversation
WalkthroughThe changes introduce a new pytest command-line option, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Pytest
participant verify_cluster_sanity
participant Logger
participant JUnitXML
User->>Pytest: Run tests with CLI options
Pytest->>verify_cluster_sanity: Call with options
verify_cluster_sanity->>Logger: Log check start
alt --cluster-sanity-skip-check
verify_cluster_sanity->>Logger: Log skipping all checks
verify_cluster_sanity->>JUnitXML: Set skipped property
verify_cluster_sanity-->>Pytest: Return
else --cluster-sanity-skip-rhoai-check
verify_cluster_sanity->>Logger: Log skipping RHOAI checks
verify_cluster_sanity->>JUnitXML: Set skipped property
verify_cluster_sanity->>Logger: Run node health and schedulability checks
else Checks run
verify_cluster_sanity->>Logger: Run all checks including RHOAI
end
alt Checks fail
alt --cluster-sanity-continue-on-failure
verify_cluster_sanity->>Logger: Log warning, continue tests
verify_cluster_sanity->>JUnitXML: Set failure with continue property
verify_cluster_sanity-->>Pytest: Return
else Default behavior
verify_cluster_sanity->>Logger: Log error, exit tests
verify_cluster_sanity->>JUnitXML: Set failure with exit code 99
verify_cluster_sanity->>Pytest: Exit test run
end
else Checks pass
verify_cluster_sanity->>Logger: Log success
verify_cluster_sanity-->>Pytest: Return
end
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
utilities/infra.py (1)
914-931: Well-implemented continue-on-failure behavior.The implementation correctly handles the different behaviors based on the
--cluster-sanity-continue-on-failureflag, with appropriate logging and JUnit XML property updates.There's a TODO comment at line 930. Consider either implementing the file output for Jenkins reporting or creating a separate ticket to track this item:
- # TODO: Write to file to easily report the failure in jenkins + # TODO: Write to file to easily report the failure in jenkins (tracked in separate ticket #XXX)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
conftest.py(2 hunks)utilities/infra.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
utilities/infra.py (2)
tests/conftest.py (3)
nodes(496-497)dsci_resource(334-335)dsc_resource(339-340)utilities/exceptions.py (1)
ResourceNotReadyError(99-100)
🔇 Additional comments (8)
conftest.py (2)
112-112: Fixed typo in help text.Corrected "Coma-separated" to "Comma-separated" in the help text.
133-139: Well-structured new CLI option for continuing tests on cluster sanity failure.This new flag provides a clean way to optionally continue test execution even when cluster sanity checks fail, enhancing flexibility for test scenarios while maintaining the default behavior of failing fast.
utilities/infra.py (6)
874-877: Clear documentation of the new behavior.The updated docstring accurately describes the cluster sanity check behavior and the effect of the new flag.
887-892: Good use of constants for CLI options.Using constants for the command-line options improves code readability and maintainability, reducing the risk of typos when referencing these options throughout the code.
894-897: Clean implementation of the skip-check functionality.The code correctly handles the case when
--cluster-sanity-skip-checkis specified, logging a warning and returning early to skip all checks.
900-900: Added helpful log message to indicate check start.Adding this log message improves debuggability by clearly showing when the cluster sanity check begins.
903-905: Good implementation of selective RHOAI check skipping.The code effectively implements the ability to skip only RHOAI-specific resource checks while still performing the basic node checks.
908-909: Added success log message.This log message helps with debugging by clearly indicating when the sanity check has successfully completed.
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/verified', '/lgtm', '/wip', '/hold'} |
utilities/infra.py
Outdated
| try: | ||
| LOGGER.info("Check cluster sanity.") | ||
|
|
||
| LOGGER.info("Running cluster sanity check...") |
There was a problem hiding this comment.
I think we should not skip here. If the nodes are not schedulable or healthy. It is a cluster problem.
There was a problem hiding this comment.
This is different from what was being discussed in slack. I do agree with you, but can we get an ack to proceed before implementing the change?
utilities/infra.py
Outdated
| if junitxml_property: | ||
| junitxml_property(name="cluster_sanity_check_failed", value=True) # type: ignore[call-arg] | ||
| junitxml_property(name="cluster_sanity_forced_continue", value=True) # type: ignore[call-arg] | ||
| # Return normally from the fixture setup |
There was a problem hiding this comment.
I believe this is for test result reporting. If cluster sanity fails and we are continuing with tests, we should not mark the tests as failure because of sanity failure. They should be marked failed only if the test fails.
There was a problem hiding this comment.
This is only marking those two specific properties, not the tests as failures
There was a problem hiding this comment.
Then the associated test would be marked as failed though (e.g. first test of model registry). Do you want that?
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit