Wait for dsc and dsci ready state in cluster_sanity check#293
Wait for dsc and dsci ready state in cluster_sanity check#293dbasunag merged 1 commit intoopendatahub-io:mainfrom
Conversation
WalkthroughThe code replaces immediate status verification functions for DSC initialization and cluster resources with new retry-enabled functions that wait for readiness, retrying every 5 seconds for up to 120 seconds. The cluster sanity check now uses these new wait functions, allowing for retries instead of failing immediately if resources are not ready. Changes
Sequence Diagram(s)sequenceDiagram
participant ClusterSanityChecker
participant DSCInitialization
participant DataScienceCluster
ClusterSanityChecker->>DSCInitialization: wait_for_dsci_status_ready()
alt Resource not ready
DSCInitialization-->>ClusterSanityChecker: Raise ResourceNotReadyError
ClusterSanityChecker->>DSCInitialization: Retry (every 5s, up to 120s)
else Resource ready
DSCInitialization-->>ClusterSanityChecker: Return True
end
ClusterSanityChecker->>DataScienceCluster: wait_for_dsc_status_ready()
alt Resource not ready
DataScienceCluster-->>ClusterSanityChecker: Raise ResourceNotReadyError
ClusterSanityChecker->>DataScienceCluster: Retry (every 5s, up to 120s)
else Resource ready
DataScienceCluster-->>ClusterSanityChecker: Return True
end
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
utilities/infra.py (2)
854-867: Robust implementation with retry mechanism.The new function
wait_for_dsci_status_readyenhances resilience by implementing a retry mechanism for DSCI readiness checks. This is a good improvement over immediate verification.Consider using a constant from the
Timeoutclass for the timeout value instead of the hard-coded 120:@retry( - wait_timeout=120, + wait_timeout=Timeout.TIMEOUT_2MIN, sleep=5, exceptions_dict={ResourceNotReadyError: []}, )
869-880: Fix grammar in log message.The implementation for DSC readiness checks with retry mechanism is good, but there's a grammar issue in the log message.
Fix the grammar in the log message to match the style of the DSCI function:
@retry( wait_timeout=120, sleep=5, exceptions_dict={ResourceNotReadyError: []}, ) def wait_for_dsc_status_ready(dsc_resource: DataScienceCluster) -> bool: - LOGGER.info(f"Wait for DSC {dsc_resource.name} are {dsc_resource.Status.READY}.") + LOGGER.info(f"Wait for DSC {dsc_resource.name} to be in {dsc_resource.Status.READY} status.") if dsc_resource.status == dsc_resource.Status.READY: return True raise ResourceNotReadyError( f"DSC {dsc_resource.name} is not ready.\nCurrent status: {dsc_resource.instance.status}" )Also consider using
Timeout.TIMEOUT_2MINfor consistency with other timeout values in the codebase.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
utilities/infra.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
utilities/infra.py (2)
utilities/exceptions.py (1)
ResourceNotReadyError(99-100)tests/conftest.py (2)
dsci_resource(334-335)dsc_resource(339-340)
🔇 Additional comments (1)
utilities/infra.py (1)
917-920: Improved cluster sanity check with retry mechanisms.The updated
verify_cluster_sanityfunction now uses the new wait functions that include retry mechanisms, making the sanity check more resilient to transient resource readiness issues.
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/verified', '/wip', '/lgtm', '/hold'} |
|
/verified |
|
Status of building tag latest: success. |
* updates to test_registering_model() based on previous review comments * [do-not-review]must-gather collection at failure point updates! 1176505 updates! 12d9c08 updates! 12d9c08 updates! 65e0213 * [ModelRegistry] ensure RunAsUser and RunAsGroup are not set explicitly (#226) updates! 4813f2b updates! 20cd457 updates! b126825 updates! 809cca7 * Lock file maintenance (#241) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * RHOAIENG-22058: chore(workbenches): add test_create_simple_notebook to smoke (#238) * Remove uv cache from dockerfile to support running in envs like openshift-ci (#239) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * fix: remove uv cache from dockerfile * `is_managed_cluster` fix condition (#243) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * fix: replace iter with list * fix: add logger info * RHOAIENG-22057: fix(workbenches): correct the check for spawned workbench (#242) There can only ever be a single workbench pod started. Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> * RHOAIENG-22057: fix(workbenches): check for internal image registry and adjust the image path accordingly (#244) * now yielding TimeoutSampler get_pods_by_isvc_label func output and handling raised ResourceNotFoundError (#237) Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [model server] add auth test to upgrade (#245) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * feat: add auth test to upgrade * feat: add auth test to upgrade feat: add auth test to upgrade * fix: dsci name in func * [pre-commit.ci] pre-commit autoupdate (#246) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.4 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.4...v0.11.5) - [github.com/gitleaks/gitleaks: v8.24.2 → v8.24.3](gitleaks/gitleaks@v8.24.2...v8.24.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * Fix add-remove-labels workflow (#249) * Add Cluster sanity checks before test execution (#235) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * feat: cluster sanity * feat: cluster sanity * feat: cluster sanity * feat: cluster sanity add readme * fix: tix str typo * fix: address comments * fix: address review comments * fix: address comment * fix: use dsci from global config * fix: remove duplicate fixture * add labeler to add labels to prs based on areas impacted (#248) * on rebase clean commented-by- labels (#251) * [model registry] update namespace code and rearrange tests (#247) * updates to test_registering_model() based on previous review comments * update namespace code and rearrange tests * remove unnecessary argument from function call (#255) * on rebase clean commented-by- labels * remove unnecessary argument from function call * feat: add ocp_interop marker (#260) * Lock file maintenance (#259) * Lock file maintenance * fix: add marshmallow version --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: rnetser <rnetser@redhat.com> * [pre-commit.ci] pre-commit autoupdate (#263) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.6](astral-sh/ruff-pre-commit@v0.11.5...v0.11.6) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * feat: add upgrade tests (#258) * Remove flake8 ignore list (#265) * fix: remove flake8 ignore * fix: remove flake8 ignore * [model server] Remove pod pre-checks for image pull and fix `TestServerlessScaleToZero` (#256) * fix: update tests * fix: update tests * fix: update tests * fix: save test dep name * fix: minio mm external route * fix: address comemnt * fix: address comemnt * fix: address comemnt * Update python-dependencies (major) (#267) * Update python-dependencies * fix: marshmellow version --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: rnetser <rnetser@redhat.com> * Adding Test For InferenceService Zero Initial Scale (#262) * adding test for zero initial scale Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing precommit error Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * using label_selectors when getting deployment Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding argument names to func call and running pre-commit on all files Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * fixing bug in ovms_kserve_inference_service function that was preventing isvcs from being created with 0 min-replicas Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: move interop marker (#268) * feat: Add upgrade tests for TrustyAIService (#250) * feat: Add upgrade tests for TrustyAIService * Move upgrade README.md to docs/UPGRADE.md * fix: reuse kwargs in TrustyAIService fixture * fix: address comments, reuse kwargs, add docstrings --------- Co-authored-by: Ruth Netser <rnetser@redhat.com> * Fix ns deletion logic (#272) * fix: fix resource deletion fixture logic * fix: fix resource deletion fixture logic * feat: fail on missing operators (#257) * fix: update tests * fix: update tests * feat: fail on missing operators * fix: rename to dependent * fix: address comment * fix: add log on failure * fix: type in raise * fix: remove MR check * fix: remove MR check * fix: use package scope * Add basic InferenceGraph deployment check (#233) * Add basic InferenceGraph deployment check This adds a test that deploys an InferenceGraph (IG), sends an inference request to the IG and verifies that the request succeeds. The deployed InferenceGraph is based on the example on the KServe documentation available in the following URL: https://kserve.github.io/website/0.15/modelserving/inference_graph/image_pipeline/. The example was adapted to run in openvino (which is a supported server in ODH), rather than TorchServe. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use cloud storage in InferenceGraph test Use cloud storage for the models, instead of OCI * Feedback: Ruth * Feedback: Ruth * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply Ruth suggestions Acknowledgement to @rnester for these changes. * More feedback: Ruth * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * fix: address 503 (#274) * [model server] Move to using unprivileged_client in tests (#273) * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * fix: unpri selection * Update MinIo pod privileges to run on ocp 4.19 (#277) * fix: add securityContext for minio pod * fix: minio on 4.19 * [model server] add multi node args check (#276) * feat: add multi node args * feat: add multi node args * fix: add wait on delete * fix: update new test * [pre-commit.ci] pre-commit autoupdate (#279) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.6 → v0.11.7](astral-sh/ruff-pre-commit@v0.11.6...v0.11.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * `verify_no_failed_pods` - exclude container failures when model mesh deployment (#278) * fix: mm container * fix: update condition * feat: add test for incorrect DB TLS config in Trusty AI (#221) * feat: add test for incorrect DB TLS config in Trusty AI * refactor: remove unused method from utils * feat: move TrustyAI test to own file * refactor: change name of db fixtures and deduplicate code * TrustyAI Service creation code refactor into own method * Move db secret setter to utils * Remove test from test_fairness as test moved to own file * docs: add description to TrustyAI invalid DB TLS config test * fix: check TrustyAIService container for Terminated status in lastStatus * fix: change name of terminal_state getter function * fix: change to a valid certificate and check for service failure * fix: address PR 221 reviewer feedback * revert wait_for_pods to wait_for_mariadb_pods * improve error checking logic * remove un-necessary wrapper function * docs: add docstring to create_trustyai_service method * docs: add docstring to trustyai_service_with_invalid_db_cert * fix: fix invalid return type for trustyai_db_ca_secret * feat: use retry decorator in validate trustyai_service_db_conn_failure method * fix: remove unnecessary return from validate db_conn_failure method * docs: add spacing between lines of docstring * refactor: create constants trustyai metrics and db storage config * refactor: address reviewer feedback - change docstring to correct formatting - remove len(0) check - no templating for error text * fix: use regex instead of in operator to check for error condition * docs: add correct formatting to docstrings * fix: use namespace.name instead of namespace in Pod.get * fix: remove \s from regex to check for spaces * refactor: add Raises section in docstring and use single string for pytest.fail * feat: use raise instead of pytest.fail - create new exception TooManyPodsError - create new exception UnexpectedFailureError - replace pytest.fail with raise and handle exceptions in retry - * fix: change default of teardown to True in TrustyAIService * docs: correct typo in trustyai docstring * docs: fix raises in docs and fix formatting * fix: fix create_trustyai_service namespace args issue * docs: add default for name arg in create tai svc func * [model server] Fix runtime request.param name to use external route (#280) * fix: fix param name * fix: fix param name * feat: add certs when sending requests to TrustyAIService (#266) * Wait for pods to be in running state before attempting to create ModelRegistry (#270) * on rebase clean commented-by- labels * Wait for pods to be in running state before attempting to create ModelRegistry * Address Exception in thread Thread-1 (_monitor) error (#286) * chore(deps): lock file maintenance (#287) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#292) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.7 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.7...v0.11.8) - [github.com/gitleaks/gitleaks: v8.24.3 → v8.25.1](gitleaks/gitleaks@v8.24.3...v8.25.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Wait for dsc and dsci ready state in cluster_sanity check (#293) * fix(workbenches): implement get_username for OpenShift <=4.14 (#275) Turns out SelfSubjectReview is only available starting OpenShift 4.15. fixup incorporate User resource * RedHatQE/openshift-python-wrapper#2387 fixup incorporate SelfSubjectReview resource * RedHatQE/openshift-python-wrapper#2389 Co-authored-by: Debarati Basu-Nag <dbasunag@redhat.com> * replace the bot account with one owned by testdevops (#291) * Fix for post upgarde operator check (#297) Signed-off-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> Co-authored-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> * Add test for Model Registry RBAC for SA token (#296) * feat: add RBAC test for SA token Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: address review comments Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: incorporate coderabbit suggestions Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: remove unneeded variable Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: remove excessive logs Signed-off-by: lugi0 <lgiorgi@redhat.com> --------- Signed-off-by: lugi0 <lgiorgi@redhat.com> * Support /build-push-pr-image comment to push image to quay for testing via jenkins (#290) updates! 678b389 * Add tests for model_artifact update validations (#284) * Add tests for model_artifact update validations * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates fixing pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update package * minor updates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments updates! 50ec24b updates! f3a6c3e updates! 792156f updates! 399aa10 updates! 5080e3b updates! c34f4e7 updates! a1d7baa --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Signed-off-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> Signed-off-by: lugi0 <lgiorgi@redhat.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Jiri Daněk <jdanek@redhat.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> Co-authored-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adolfo Aguirrezabal <aaguirre@redhat.com> Co-authored-by: Edgar Hernández <ehernand@redhat.com> Co-authored-by: Shelton Cyril <sheltoncyril@gmail.com> Co-authored-by: Milind Waykole <mwaykole@redhat.com> Co-authored-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb>
* updates to test_registering_model() based on previous review comments * [do-not-review]must-gather collection at failure point updates! 1176505 updates! 12d9c08 updates! 12d9c08 updates! 65e0213 * [ModelRegistry] ensure RunAsUser and RunAsGroup are not set explicitly (#226) updates! 4813f2b updates! 20cd457 updates! b126825 updates! 809cca7 * Lock file maintenance (#241) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * RHOAIENG-22058: chore(workbenches): add test_create_simple_notebook to smoke (#238) * Remove uv cache from dockerfile to support running in envs like openshift-ci (#239) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * fix: remove uv cache from dockerfile * `is_managed_cluster` fix condition (#243) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * fix: replace iter with list * fix: add logger info * RHOAIENG-22057: fix(workbenches): correct the check for spawned workbench (#242) There can only ever be a single workbench pod started. Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> * RHOAIENG-22057: fix(workbenches): check for internal image registry and adjust the image path accordingly (#244) * now yielding TimeoutSampler get_pods_by_isvc_label func output and handling raised ResourceNotFoundError (#237) Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [model server] add auth test to upgrade (#245) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * feat: add auth test to upgrade * feat: add auth test to upgrade feat: add auth test to upgrade * fix: dsci name in func * [pre-commit.ci] pre-commit autoupdate (#246) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.4 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.4...v0.11.5) - [github.com/gitleaks/gitleaks: v8.24.2 → v8.24.3](gitleaks/gitleaks@v8.24.2...v8.24.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * Fix add-remove-labels workflow (#249) * Add Cluster sanity checks before test execution (#235) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * feat: cluster sanity * feat: cluster sanity * feat: cluster sanity * feat: cluster sanity add readme * fix: tix str typo * fix: address comments * fix: address review comments * fix: address comment * fix: use dsci from global config * fix: remove duplicate fixture * add labeler to add labels to prs based on areas impacted (#248) * on rebase clean commented-by- labels (#251) * [model registry] update namespace code and rearrange tests (#247) * updates to test_registering_model() based on previous review comments * update namespace code and rearrange tests * remove unnecessary argument from function call (#255) * on rebase clean commented-by- labels * remove unnecessary argument from function call * feat: add ocp_interop marker (#260) * Lock file maintenance (#259) * Lock file maintenance * fix: add marshmallow version --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: rnetser <rnetser@redhat.com> * [pre-commit.ci] pre-commit autoupdate (#263) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.6](astral-sh/ruff-pre-commit@v0.11.5...v0.11.6) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * feat: add upgrade tests (#258) * Remove flake8 ignore list (#265) * fix: remove flake8 ignore * fix: remove flake8 ignore * [model server] Remove pod pre-checks for image pull and fix `TestServerlessScaleToZero` (#256) * fix: update tests * fix: update tests * fix: update tests * fix: save test dep name * fix: minio mm external route * fix: address comemnt * fix: address comemnt * fix: address comemnt * Update python-dependencies (major) (#267) * Update python-dependencies * fix: marshmellow version --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: rnetser <rnetser@redhat.com> * Adding Test For InferenceService Zero Initial Scale (#262) * adding test for zero initial scale Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing precommit error Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * using label_selectors when getting deployment Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding argument names to func call and running pre-commit on all files Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * fixing bug in ovms_kserve_inference_service function that was preventing isvcs from being created with 0 min-replicas Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: move interop marker (#268) * feat: Add upgrade tests for TrustyAIService (#250) * feat: Add upgrade tests for TrustyAIService * Move upgrade README.md to docs/UPGRADE.md * fix: reuse kwargs in TrustyAIService fixture * fix: address comments, reuse kwargs, add docstrings --------- Co-authored-by: Ruth Netser <rnetser@redhat.com> * Fix ns deletion logic (#272) * fix: fix resource deletion fixture logic * fix: fix resource deletion fixture logic * feat: fail on missing operators (#257) * fix: update tests * fix: update tests * feat: fail on missing operators * fix: rename to dependent * fix: address comment * fix: add log on failure * fix: type in raise * fix: remove MR check * fix: remove MR check * fix: use package scope * Add basic InferenceGraph deployment check (#233) * Add basic InferenceGraph deployment check This adds a test that deploys an InferenceGraph (IG), sends an inference request to the IG and verifies that the request succeeds. The deployed InferenceGraph is based on the example on the KServe documentation available in the following URL: https://kserve.github.io/website/0.15/modelserving/inference_graph/image_pipeline/. The example was adapted to run in openvino (which is a supported server in ODH), rather than TorchServe. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use cloud storage in InferenceGraph test Use cloud storage for the models, instead of OCI * Feedback: Ruth * Feedback: Ruth * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply Ruth suggestions Acknowledgement to @rnester for these changes. * More feedback: Ruth * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * fix: address 503 (#274) * [model server] Move to using unprivileged_client in tests (#273) * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * fix: unpri selection * Update MinIo pod privileges to run on ocp 4.19 (#277) * fix: add securityContext for minio pod * fix: minio on 4.19 * [model server] add multi node args check (#276) * feat: add multi node args * feat: add multi node args * fix: add wait on delete * fix: update new test * [pre-commit.ci] pre-commit autoupdate (#279) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.6 → v0.11.7](astral-sh/ruff-pre-commit@v0.11.6...v0.11.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * `verify_no_failed_pods` - exclude container failures when model mesh deployment (#278) * fix: mm container * fix: update condition * feat: add test for incorrect DB TLS config in Trusty AI (#221) * feat: add test for incorrect DB TLS config in Trusty AI * refactor: remove unused method from utils * feat: move TrustyAI test to own file * refactor: change name of db fixtures and deduplicate code * TrustyAI Service creation code refactor into own method * Move db secret setter to utils * Remove test from test_fairness as test moved to own file * docs: add description to TrustyAI invalid DB TLS config test * fix: check TrustyAIService container for Terminated status in lastStatus * fix: change name of terminal_state getter function * fix: change to a valid certificate and check for service failure * fix: address PR 221 reviewer feedback * revert wait_for_pods to wait_for_mariadb_pods * improve error checking logic * remove un-necessary wrapper function * docs: add docstring to create_trustyai_service method * docs: add docstring to trustyai_service_with_invalid_db_cert * fix: fix invalid return type for trustyai_db_ca_secret * feat: use retry decorator in validate trustyai_service_db_conn_failure method * fix: remove unnecessary return from validate db_conn_failure method * docs: add spacing between lines of docstring * refactor: create constants trustyai metrics and db storage config * refactor: address reviewer feedback - change docstring to correct formatting - remove len(0) check - no templating for error text * fix: use regex instead of in operator to check for error condition * docs: add correct formatting to docstrings * fix: use namespace.name instead of namespace in Pod.get * fix: remove \s from regex to check for spaces * refactor: add Raises section in docstring and use single string for pytest.fail * feat: use raise instead of pytest.fail - create new exception TooManyPodsError - create new exception UnexpectedFailureError - replace pytest.fail with raise and handle exceptions in retry - * fix: change default of teardown to True in TrustyAIService * docs: correct typo in trustyai docstring * docs: fix raises in docs and fix formatting * fix: fix create_trustyai_service namespace args issue * docs: add default for name arg in create tai svc func * [model server] Fix runtime request.param name to use external route (#280) * fix: fix param name * fix: fix param name * feat: add certs when sending requests to TrustyAIService (#266) * Wait for pods to be in running state before attempting to create ModelRegistry (#270) * on rebase clean commented-by- labels * Wait for pods to be in running state before attempting to create ModelRegistry * Address Exception in thread Thread-1 (_monitor) error (opendatahub-io#286) * chore(deps): lock file maintenance (opendatahub-io#287) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#292) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.7 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.7...v0.11.8) - [github.com/gitleaks/gitleaks: v8.24.3 → v8.25.1](gitleaks/gitleaks@v8.24.3...v8.25.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Wait for dsc and dsci ready state in cluster_sanity check (opendatahub-io#293) * fix(workbenches): implement get_username for OpenShift <=4.14 (#275) Turns out SelfSubjectReview is only available starting OpenShift 4.15. fixup incorporate User resource * RedHatQE/openshift-python-wrapper#2387 fixup incorporate SelfSubjectReview resource * RedHatQE/openshift-python-wrapper#2389 Co-authored-by: Debarati Basu-Nag <dbasunag@redhat.com> * replace the bot account with one owned by testdevops (opendatahub-io#291) * Fix for post upgarde operator check (opendatahub-io#297) Signed-off-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> Co-authored-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> * Add test for Model Registry RBAC for SA token (opendatahub-io#296) * feat: add RBAC test for SA token Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: address review comments Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: incorporate coderabbit suggestions Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: remove unneeded variable Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: remove excessive logs Signed-off-by: lugi0 <lgiorgi@redhat.com> --------- Signed-off-by: lugi0 <lgiorgi@redhat.com> * Support /build-push-pr-image comment to push image to quay for testing via jenkins (opendatahub-io#290) updates! 678b389 * Add tests for model_artifact update validations (#284) * Add tests for model_artifact update validations * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates fixing pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update package * minor updates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments updates! 50ec24b updates! f3a6c3e updates! 792156f updates! 399aa10 updates! 5080e3b updates! c34f4e7 updates! a1d7baa --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Signed-off-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> Signed-off-by: lugi0 <lgiorgi@redhat.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Jiri Daněk <jdanek@redhat.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> Co-authored-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adolfo Aguirrezabal <aaguirre@redhat.com> Co-authored-by: Edgar Hernández <ehernand@redhat.com> Co-authored-by: Shelton Cyril <sheltoncyril@gmail.com> Co-authored-by: Milind Waykole <mwaykole@redhat.com> Co-authored-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb>
* updates to test_registering_model() based on previous review comments * [do-not-review]must-gather collection at failure point updates! 1176505 updates! 12d9c08 updates! 12d9c08 updates! 65e0213 * [ModelRegistry] ensure RunAsUser and RunAsGroup are not set explicitly (opendatahub-io#226) updates! 4813f2b updates! 20cd457 updates! b126825 updates! 809cca7 * Lock file maintenance (opendatahub-io#241) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * RHOAIENG-22058: chore(workbenches): add test_create_simple_notebook to smoke (opendatahub-io#238) * Remove uv cache from dockerfile to support running in envs like openshift-ci (opendatahub-io#239) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * fix: remove uv cache from dockerfile * `is_managed_cluster` fix condition (opendatahub-io#243) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * fix: replace iter with list * fix: add logger info * RHOAIENG-22057: fix(workbenches): correct the check for spawned workbench (opendatahub-io#242) There can only ever be a single workbench pod started. Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> * RHOAIENG-22057: fix(workbenches): check for internal image registry and adjust the image path accordingly (opendatahub-io#244) * now yielding TimeoutSampler get_pods_by_isvc_label func output and handling raised ResourceNotFoundError (opendatahub-io#237) Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [model server] add auth test to upgrade (opendatahub-io#245) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * feat: add auth test to upgrade * feat: add auth test to upgrade feat: add auth test to upgrade * fix: dsci name in func * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#246) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.4 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.4...v0.11.5) - [github.com/gitleaks/gitleaks: v8.24.2 → v8.24.3](gitleaks/gitleaks@v8.24.2...v8.24.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * Fix add-remove-labels workflow (opendatahub-io#249) * Add Cluster sanity checks before test execution (opendatahub-io#235) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * feat: cluster sanity * feat: cluster sanity * feat: cluster sanity * feat: cluster sanity add readme * fix: tix str typo * fix: address comments * fix: address review comments * fix: address comment * fix: use dsci from global config * fix: remove duplicate fixture * add labeler to add labels to prs based on areas impacted (opendatahub-io#248) * on rebase clean commented-by- labels (opendatahub-io#251) * [model registry] update namespace code and rearrange tests (opendatahub-io#247) * updates to test_registering_model() based on previous review comments * update namespace code and rearrange tests * remove unnecessary argument from function call (opendatahub-io#255) * on rebase clean commented-by- labels * remove unnecessary argument from function call * feat: add ocp_interop marker (opendatahub-io#260) * Lock file maintenance (opendatahub-io#259) * Lock file maintenance * fix: add marshmallow version --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: rnetser <rnetser@redhat.com> * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#263) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.6](astral-sh/ruff-pre-commit@v0.11.5...v0.11.6) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * feat: add upgrade tests (opendatahub-io#258) * Remove flake8 ignore list (opendatahub-io#265) * fix: remove flake8 ignore * fix: remove flake8 ignore * [model server] Remove pod pre-checks for image pull and fix `TestServerlessScaleToZero` (opendatahub-io#256) * fix: update tests * fix: update tests * fix: update tests * fix: save test dep name * fix: minio mm external route * fix: address comemnt * fix: address comemnt * fix: address comemnt * Update python-dependencies (major) (opendatahub-io#267) * Update python-dependencies * fix: marshmellow version --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: rnetser <rnetser@redhat.com> * Adding Test For InferenceService Zero Initial Scale (opendatahub-io#262) * adding test for zero initial scale Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing precommit error Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * using label_selectors when getting deployment Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding argument names to func call and running pre-commit on all files Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * fixing bug in ovms_kserve_inference_service function that was preventing isvcs from being created with 0 min-replicas Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: move interop marker (opendatahub-io#268) * feat: Add upgrade tests for TrustyAIService (opendatahub-io#250) * feat: Add upgrade tests for TrustyAIService * Move upgrade README.md to docs/UPGRADE.md * fix: reuse kwargs in TrustyAIService fixture * fix: address comments, reuse kwargs, add docstrings --------- Co-authored-by: Ruth Netser <rnetser@redhat.com> * Fix ns deletion logic (opendatahub-io#272) * fix: fix resource deletion fixture logic * fix: fix resource deletion fixture logic * feat: fail on missing operators (opendatahub-io#257) * fix: update tests * fix: update tests * feat: fail on missing operators * fix: rename to dependent * fix: address comment * fix: add log on failure * fix: type in raise * fix: remove MR check * fix: remove MR check * fix: use package scope * Add basic InferenceGraph deployment check (opendatahub-io#233) * Add basic InferenceGraph deployment check This adds a test that deploys an InferenceGraph (IG), sends an inference request to the IG and verifies that the request succeeds. The deployed InferenceGraph is based on the example on the KServe documentation available in the following URL: https://kserve.github.io/website/0.15/modelserving/inference_graph/image_pipeline/. The example was adapted to run in openvino (which is a supported server in ODH), rather than TorchServe. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use cloud storage in InferenceGraph test Use cloud storage for the models, instead of OCI * Feedback: Ruth * Feedback: Ruth * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply Ruth suggestions Acknowledgement to @rnester for these changes. * More feedback: Ruth * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * fix: address 503 (opendatahub-io#274) * [model server] Move to using unprivileged_client in tests (opendatahub-io#273) * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * fix: unpri selection * Update MinIo pod privileges to run on ocp 4.19 (opendatahub-io#277) * fix: add securityContext for minio pod * fix: minio on 4.19 * [model server] add multi node args check (opendatahub-io#276) * feat: add multi node args * feat: add multi node args * fix: add wait on delete * fix: update new test * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#279) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.6 → v0.11.7](astral-sh/ruff-pre-commit@v0.11.6...v0.11.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * `verify_no_failed_pods` - exclude container failures when model mesh deployment (opendatahub-io#278) * fix: mm container * fix: update condition * feat: add test for incorrect DB TLS config in Trusty AI (opendatahub-io#221) * feat: add test for incorrect DB TLS config in Trusty AI * refactor: remove unused method from utils * feat: move TrustyAI test to own file * refactor: change name of db fixtures and deduplicate code * TrustyAI Service creation code refactor into own method * Move db secret setter to utils * Remove test from test_fairness as test moved to own file * docs: add description to TrustyAI invalid DB TLS config test * fix: check TrustyAIService container for Terminated status in lastStatus * fix: change name of terminal_state getter function * fix: change to a valid certificate and check for service failure * fix: address PR 221 reviewer feedback * revert wait_for_pods to wait_for_mariadb_pods * improve error checking logic * remove un-necessary wrapper function * docs: add docstring to create_trustyai_service method * docs: add docstring to trustyai_service_with_invalid_db_cert * fix: fix invalid return type for trustyai_db_ca_secret * feat: use retry decorator in validate trustyai_service_db_conn_failure method * fix: remove unnecessary return from validate db_conn_failure method * docs: add spacing between lines of docstring * refactor: create constants trustyai metrics and db storage config * refactor: address reviewer feedback - change docstring to correct formatting - remove len(0) check - no templating for error text * fix: use regex instead of in operator to check for error condition * docs: add correct formatting to docstrings * fix: use namespace.name instead of namespace in Pod.get * fix: remove \s from regex to check for spaces * refactor: add Raises section in docstring and use single string for pytest.fail * feat: use raise instead of pytest.fail - create new exception TooManyPodsError - create new exception UnexpectedFailureError - replace pytest.fail with raise and handle exceptions in retry - * fix: change default of teardown to True in TrustyAIService * docs: correct typo in trustyai docstring * docs: fix raises in docs and fix formatting * fix: fix create_trustyai_service namespace args issue * docs: add default for name arg in create tai svc func * [model server] Fix runtime request.param name to use external route (opendatahub-io#280) * fix: fix param name * fix: fix param name * feat: add certs when sending requests to TrustyAIService (opendatahub-io#266) * Wait for pods to be in running state before attempting to create ModelRegistry (opendatahub-io#270) * on rebase clean commented-by- labels * Wait for pods to be in running state before attempting to create ModelRegistry * Address Exception in thread Thread-1 (_monitor) error (opendatahub-io#286) * chore(deps): lock file maintenance (opendatahub-io#287) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#292) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.7 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.7...v0.11.8) - [github.com/gitleaks/gitleaks: v8.24.3 → v8.25.1](gitleaks/gitleaks@v8.24.3...v8.25.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Wait for dsc and dsci ready state in cluster_sanity check (opendatahub-io#293) * fix(workbenches): implement get_username for OpenShift <=4.14 (opendatahub-io#275) Turns out SelfSubjectReview is only available starting OpenShift 4.15. fixup incorporate User resource * RedHatQE/openshift-python-wrapper#2387 fixup incorporate SelfSubjectReview resource * RedHatQE/openshift-python-wrapper#2389 Co-authored-by: Debarati Basu-Nag <dbasunag@redhat.com> * replace the bot account with one owned by testdevops (opendatahub-io#291) * Fix for post upgarde operator check (opendatahub-io#297) Signed-off-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> Co-authored-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> * Add test for Model Registry RBAC for SA token (opendatahub-io#296) * feat: add RBAC test for SA token Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: address review comments Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: incorporate coderabbit suggestions Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: remove unneeded variable Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: remove excessive logs Signed-off-by: lugi0 <lgiorgi@redhat.com> --------- Signed-off-by: lugi0 <lgiorgi@redhat.com> * Support /build-push-pr-image comment to push image to quay for testing via jenkins (opendatahub-io#290) updates! 678b389 * Add tests for model_artifact update validations (opendatahub-io#284) * Add tests for model_artifact update validations * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates fixing pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update package * minor updates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments updates! 50ec24b updates! f3a6c3e updates! 792156f updates! 399aa10 updates! 5080e3b updates! c34f4e7 updates! a1d7baa --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Signed-off-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> Signed-off-by: lugi0 <lgiorgi@redhat.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Jiri Daněk <jdanek@redhat.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> Co-authored-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adolfo Aguirrezabal <aaguirre@redhat.com> Co-authored-by: Edgar Hernández <ehernand@redhat.com> Co-authored-by: Shelton Cyril <sheltoncyril@gmail.com> Co-authored-by: Milind Waykole <mwaykole@redhat.com> Co-authored-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb>
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit