Collect must-gather at the failure point (opendatahub-io#240)

dbasunag · renovate[bot] · jiridanek · adolfo-ab · commit c02d473ec69a · 2025-06-11T12:26:23.000+02:00
* updates to test_registering_model() based on previous review comments * [do-not-review]must-gather collection at failure point updates! 1176505 updates! 12d9c08 updates! 12d9c08 updates! 65e0213 * [ModelRegistry] ensure RunAsUser and RunAsGroup are not set explicitly (opendatahub-io#226) updates! 4813f2b updates! 20cd457 updates! b126825 updates! 809cca7 * Lock file maintenance (opendatahub-io#241) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * RHOAIENG-22058: chore(workbenches): add test_create_simple_notebook to smoke (opendatahub-io#238) * Remove uv cache from dockerfile to support running in envs like openshift-ci (opendatahub-io#239) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * fix: remove uv cache from dockerfile * `is_managed_cluster` fix condition (opendatahub-io#243) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * fix: replace iter with list * fix: add logger info * RHOAIENG-22057: fix(workbenches): correct the check for spawned workbench (opendatahub-io#242) There can only ever be a single workbench pod started. Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> * RHOAIENG-22057: fix(workbenches): check for internal image registry and adjust the image path accordingly (opendatahub-io#244) * now yielding TimeoutSampler get_pods_by_isvc_label func output and handling raised ResourceNotFoundError (opendatahub-io#237) Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [model server] add auth test to upgrade (opendatahub-io#245) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * feat: add auth test to upgrade * feat: add auth test to upgrade feat: add auth test to upgrade * fix: dsci name in func * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#246) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.4 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.4...v0.11.5) - [github.com/gitleaks/gitleaks: v8.24.2 → v8.24.3](gitleaks/gitleaks@v8.24.2...v8.24.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * Fix add-remove-labels workflow (opendatahub-io#249) * Add Cluster sanity checks before test execution (opendatahub-io#235) * Create size-labeler.yml * Delete .github/workflows/size-labeler.yml * model mesh - add auth tests * xx * feat: cluster sanity * feat: cluster sanity * feat: cluster sanity * feat: cluster sanity add readme * fix: tix str typo * fix: address comments * fix: address review comments * fix: address comment * fix: use dsci from global config * fix: remove duplicate fixture * add labeler to add labels to prs based on areas impacted (opendatahub-io#248) * on rebase clean commented-by- labels (opendatahub-io#251) * [model registry] update namespace code and rearrange tests (opendatahub-io#247) * updates to test_registering_model() based on previous review comments * update namespace code and rearrange tests * remove unnecessary argument from function call (opendatahub-io#255) * on rebase clean commented-by- labels * remove unnecessary argument from function call * feat: add ocp_interop marker (opendatahub-io#260) * Lock file maintenance (opendatahub-io#259) * Lock file maintenance * fix: add marshmallow version --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: rnetser <rnetser@redhat.com> * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#263) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.6](astral-sh/ruff-pre-commit@v0.11.5...v0.11.6) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * feat: add upgrade tests (opendatahub-io#258) * Remove flake8 ignore list (opendatahub-io#265) * fix: remove flake8 ignore * fix: remove flake8 ignore * [model server] Remove pod pre-checks for image pull and fix `TestServerlessScaleToZero` (opendatahub-io#256) * fix: update tests * fix: update tests * fix: update tests * fix: save test dep name * fix: minio mm external route * fix: address comemnt * fix: address comemnt * fix: address comemnt * Update python-dependencies (major) (opendatahub-io#267) * Update python-dependencies * fix: marshmellow version --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: rnetser <rnetser@redhat.com> * Adding Test For InferenceService Zero Initial Scale (opendatahub-io#262) * adding test for zero initial scale Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing precommit error Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * using label_selectors when getting deployment Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding argument names to func call and running pre-commit on all files Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> * fixing bug in ovms_kserve_inference_service function that was preventing isvcs from being created with 0 min-replicas Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: move interop marker (opendatahub-io#268) * feat: Add upgrade tests for TrustyAIService (opendatahub-io#250) * feat: Add upgrade tests for TrustyAIService * Move upgrade README.md to docs/UPGRADE.md * fix: reuse kwargs in TrustyAIService fixture * fix: address comments, reuse kwargs, add docstrings --------- Co-authored-by: Ruth Netser <rnetser@redhat.com> * Fix ns deletion logic (opendatahub-io#272) * fix: fix resource deletion fixture logic * fix: fix resource deletion fixture logic * feat: fail on missing operators (opendatahub-io#257) * fix: update tests * fix: update tests * feat: fail on missing operators * fix: rename to dependent * fix: address comment * fix: add log on failure * fix: type in raise * fix: remove MR check * fix: remove MR check * fix: use package scope * Add basic InferenceGraph deployment check (opendatahub-io#233) * Add basic InferenceGraph deployment check This adds a test that deploys an InferenceGraph (IG), sends an inference request to the IG and verifies that the request succeeds. The deployed InferenceGraph is based on the example on the KServe documentation available in the following URL: https://kserve.github.io/website/0.15/modelserving/inference_graph/image_pipeline/. The example was adapted to run in openvino (which is a supported server in ODH), rather than TorchServe. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use cloud storage in InferenceGraph test Use cloud storage for the models, instead of OCI * Feedback: Ruth * Feedback: Ruth * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply Ruth suggestions Acknowledgement to @rnester for these changes. * More feedback: Ruth * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * fix: address 503 (opendatahub-io#274) * [model server] Move to using unprivileged_client in tests (opendatahub-io#273) * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * feat: use unprivileged_client * fix: unpri selection * Update MinIo pod privileges to run on ocp 4.19 (opendatahub-io#277) * fix: add securityContext for minio pod * fix: minio on 4.19 * [model server] add multi node args check (opendatahub-io#276) * feat: add multi node args * feat: add multi node args * fix: add wait on delete * fix: update new test * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#279) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.6 → v0.11.7](astral-sh/ruff-pre-commit@v0.11.6...v0.11.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> * `verify_no_failed_pods` - exclude container failures when model mesh deployment (opendatahub-io#278) * fix: mm container * fix: update condition * feat: add test for incorrect DB TLS config in Trusty AI (opendatahub-io#221) * feat: add test for incorrect DB TLS config in Trusty AI * refactor: remove unused method from utils * feat: move TrustyAI test to own file * refactor: change name of db fixtures and deduplicate code * TrustyAI Service creation code refactor into own method * Move db secret setter to utils * Remove test from test_fairness as test moved to own file * docs: add description to TrustyAI invalid DB TLS config test * fix: check TrustyAIService container for Terminated status in lastStatus * fix: change name of terminal_state getter function * fix: change to a valid certificate and check for service failure * fix: address PR 221 reviewer feedback * revert wait_for_pods to wait_for_mariadb_pods * improve error checking logic * remove un-necessary wrapper function * docs: add docstring to create_trustyai_service method * docs: add docstring to trustyai_service_with_invalid_db_cert * fix: fix invalid return type for trustyai_db_ca_secret * feat: use retry decorator in validate trustyai_service_db_conn_failure method * fix: remove unnecessary return from validate db_conn_failure method * docs: add spacing between lines of docstring * refactor: create constants trustyai metrics and db storage config * refactor: address reviewer feedback - change docstring to correct formatting - remove len(0) check - no templating for error text * fix: use regex instead of in operator to check for error condition * docs: add correct formatting to docstrings * fix: use namespace.name instead of namespace in Pod.get * fix: remove \s from regex to check for spaces * refactor: add Raises section in docstring and use single string for pytest.fail * feat: use raise instead of pytest.fail - create new exception TooManyPodsError - create new exception UnexpectedFailureError - replace pytest.fail with raise and handle exceptions in retry - * fix: change default of teardown to True in TrustyAIService * docs: correct typo in trustyai docstring * docs: fix raises in docs and fix formatting * fix: fix create_trustyai_service namespace args issue * docs: add default for name arg in create tai svc func * [model server] Fix runtime request.param name to use external route (opendatahub-io#280) * fix: fix param name * fix: fix param name * feat: add certs when sending requests to TrustyAIService (opendatahub-io#266) * Wait for pods to be in running state before attempting to create ModelRegistry (opendatahub-io#270) * on rebase clean commented-by- labels * Wait for pods to be in running state before attempting to create ModelRegistry * Address Exception in thread Thread-1 (_monitor) error (opendatahub-io#286) * chore(deps): lock file maintenance (opendatahub-io#287) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (opendatahub-io#292) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.7 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.7...v0.11.8) - [github.com/gitleaks/gitleaks: v8.24.3 → v8.25.1](gitleaks/gitleaks@v8.24.3...v8.25.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Wait for dsc and dsci ready state in cluster_sanity check (opendatahub-io#293) * fix(workbenches): implement get_username for OpenShift <=4.14 (opendatahub-io#275) Turns out SelfSubjectReview is only available starting OpenShift 4.15. fixup incorporate User resource * RedHatQE/openshift-python-wrapper#2387 fixup incorporate SelfSubjectReview resource * RedHatQE/openshift-python-wrapper#2389 Co-authored-by: Debarati Basu-Nag <dbasunag@redhat.com> * replace the bot account with one owned by testdevops (opendatahub-io#291) * Fix for post upgarde operator check (opendatahub-io#297) Signed-off-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> Co-authored-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> * Add test for Model Registry RBAC for SA token (opendatahub-io#296) * feat: add RBAC test for SA token Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: address review comments Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: incorporate coderabbit suggestions Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: remove unneeded variable Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: remove excessive logs Signed-off-by: lugi0 <lgiorgi@redhat.com> --------- Signed-off-by: lugi0 <lgiorgi@redhat.com> * Support /build-push-pr-image comment to push image to quay for testing via jenkins (opendatahub-io#290) updates! 678b389 * Add tests for model_artifact update validations (opendatahub-io#284) * Add tests for model_artifact update validations * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates fixing pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update package * minor updates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments updates! 50ec24b updates! f3a6c3e updates! 792156f updates! 399aa10 updates! 5080e3b updates! c34f4e7 updates! a1d7baa --------- Signed-off-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Signed-off-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb> Signed-off-by: lugi0 <lgiorgi@redhat.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Jiri Daněk <jdanek@redhat.com> Co-authored-by: Ruth Netser <rnetser@redhat.com> Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> Co-authored-by: Brett Thompson <196701379+brettmthompson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adolfo Aguirrezabal <aaguirre@redhat.com> Co-authored-by: Edgar Hernández <ehernand@redhat.com> Co-authored-by: Shelton Cyril <sheltoncyril@gmail.com> Co-authored-by: Milind Waykole <mwaykole@redhat.com> Co-authored-by: Milind Waykole <mwaykole@mwaykole-thinkpadp1gen4i.bengluru.csb>
diff --git a/.flake8 b/.flake8
@@ -19,6 +19,7 @@ fcn_exclude_functions =
     re,
     logging,
     LOGGER,
+    BASIC_LOGGER,
     os,
     json,
     pytest,
diff --git a/conftest.py b/conftest.py
@@ -2,14 +2,19 @@
 import os
 import pathlib
 import shutil
+import datetime
+import traceback
 
 import shortuuid
+from _pytest.runner import CallInfo
+from _pytest.reports import TestReport
 from pytest import (
     Parser,
     Session,
     FixtureRequest,
     FixtureDef,
     Item,
+    Collector,
     Config,
     CollectReport,
 )
@@ -18,8 +23,15 @@
 from pytest_testconfig import config as py_config
 
 from utilities.constants import KServeDeploymentType
+from utilities.database import Database
 from utilities.logger import separator, setup_logging
-
+from utilities.must_gather_collector import (
+    set_must_gather_collector_directory,
+    set_must_gather_collector_values,
+    get_must_gather_collector_dir,
+    collect_rhoai_must_gather,
+    get_base_dir,
+)
 
 LOGGER = logging.getLogger(name=__name__)
 BASIC_LOGGER = logging.getLogger(name="basic")
@@ -31,6 +43,7 @@ def pytest_addoption(parser: Parser) -> None:
     runtime_group = parser.getgroup(name="Runtime details")
     upgrade_group = parser.getgroup(name="Upgrade options")
     platform_group = parser.getgroup(name="Platform")
+    must_gather_group = parser.getgroup(name="MustGather")
     cluster_sanity_group = parser.getgroup(name="ClusterSanity")
 
     # AWS config and credentials options
@@ -118,6 +131,12 @@ def pytest_addoption(parser: Parser) -> None:
         "--applications-namespace",
         help="RHOAI/ODH applications namespace",
     )
+    must_gather_group.addoption(
+        "--collect-must-gather",
+        help="Indicate if must-gather should be collected on failure.",
+        action="store_true",
+        default=False,
+    )
 
     # Cluster sanity options
     cluster_sanity_group.addoption(
@@ -205,14 +224,22 @@ def _add_upgrade_test(_item: Item, _upgrade_deployment_modes: list[str]) -> bool
 
 
 def pytest_sessionstart(session: Session) -> None:
-    tests_log_file = session.config.getoption("log_file") or "pytest-tests.log"
+    log_file = session.config.getoption("log_file") or "pytest-tests.log"
+    tests_log_file = os.path.join(get_base_dir(), log_file)
+    LOGGER.info(f"Writing tests log to {tests_log_file}")
     if os.path.exists(tests_log_file):
         pathlib.Path(tests_log_file).unlink()
-
+    if session.config.getoption("--collect-must-gather"):
+        session.config.option.must_gather_db = Database()
     session.config.option.log_listener = setup_logging(
         log_file=tests_log_file,
         log_level=session.config.getoption("log_cli_level") or logging.INFO,
     )
+    must_gather_dict = set_must_gather_collector_values()
+    shutil.rmtree(
+        path=must_gather_dict["must_gather_base_directory"],
+        ignore_errors=True,
+    )
 
 
 def pytest_fixture_setup(fixturedef: FixtureDef[Any], request: FixtureRequest) -> None:
@@ -226,9 +253,23 @@ def pytest_runtest_setup(item: Item) -> None:
     2. Adds `fail_if_missing_dependent_operators` fixture for Serverless tests.
     3. Adds fixtures to enable KServe/model mesh in DSC for model server tests.
     """
-
     BASIC_LOGGER.info(f"\n{separator(symbol_='-', val=item.name)}")
     BASIC_LOGGER.info(f"{separator(symbol_='-', val='SETUP')}")
+    if item.config.getoption("--collect-must-gather"):
+        # set must-gather collection directory:
+        set_must_gather_collector_directory(item=item, directory_path=get_must_gather_collector_dir())
+
+        # At the begining of setup work, insert current epoch time into the database to indicate test
+        # start time
+
+        try:
+            db = item.config.option.must_gather_db
+            db.insert_test_start_time(
+                test_name=f"{item.fspath}::{item.name}",
+                start_time=int(datetime.datetime.now().timestamp()),
+            )
+        except Exception as db_exception:
+            LOGGER.error(f"Database error: {db_exception}. Must-gather collection may not be accurate")
 
     if KServeDeploymentType.SERVERLESS.lower() in item.keywords:
         item.fixturenames.insert(0, "fail_if_missing_dependent_operators")
@@ -252,6 +293,10 @@ def pytest_runtest_call(item: Item) -> None:
 
 def pytest_runtest_teardown(item: Item) -> None:
     BASIC_LOGGER.info(f"{separator(symbol_='-', val='TEARDOWN')}")
+    # reset must-gather collector after each tests
+    py_config["must_gather_collector"]["collector_directory"] = py_config["must_gather_collector"][
+        "must_gather_base_directory"
+    ]
 
 
 def pytest_report_teststatus(report: CollectReport, config: Config) -> None:
@@ -276,10 +321,56 @@ def pytest_sessionfinish(session: Session, exitstatus: int) -> None:
     session.config.option.log_listener.stop()
     if session.config.option.setupplan or session.config.option.collectonly:
         return
-    base_dir = py_config["tmp_base_dir"]
-    LOGGER.info(f"Deleting pytest base dir {base_dir}")
-    shutil.rmtree(path=base_dir, ignore_errors=True)
+    if session.config.getoption("--collect-must-gather"):
+        db = session.config.option.must_gather_db
+        file_path = db.database_file_path
+        LOGGER.info(f"Removing database file path {file_path}")
+        if os.path.exists(file_path):
+            os.remove(file_path)
+        # clean up the empty folders
+    collector_directory = py_config["must_gather_collector"]["must_gather_base_directory"]
+    if os.path.exists(collector_directory):
+        for root, dirs, files in os.walk(collector_directory, topdown=False):
+            for _dir in dirs:
+                dir_path = os.path.join(root, _dir)
+                if not os.listdir(dir_path):
+                    shutil.rmtree(path=dir_path, ignore_errors=True)
+    LOGGER.info(f"Deleting pytest base dir {session.config.option.basetemp}")
+    shutil.rmtree(path=session.config.option.basetemp, ignore_errors=True)
 
     reporter: Optional[TerminalReporter] = session.config.pluginmanager.get_plugin("terminalreporter")
     if reporter:
         reporter.summary_stats()
+
+
+def calculate_must_gather_timer(test_start_time: int) -> int:
+    default_duration = 300
+    if test_start_time > 0:
+        duration = int(datetime.datetime.now().timestamp()) - test_start_time
+        return duration if duration > 60 else default_duration
+    else:
+        LOGGER.warning(f"Could not get start time of test. Collecting must-gather for last {default_duration}s")
+        return default_duration
+
+
+def pytest_exception_interact(node: Item | Collector, call: CallInfo[Any], report: TestReport | CollectReport) -> None:
+    LOGGER.error(report.longreprtext)
+    if node.config.getoption("--collect-must-gather"):
+        test_name = f"{node.fspath}::{node.name}"
+        LOGGER.info(f"Must-gather collection is enabled for {test_name}.")
+
+        try:
+            db = node.config.option.must_gather_db
+            test_start_time = db.get_test_start_time(test_name=test_name)
+        except Exception as db_exception:
+            test_start_time = 0
+            LOGGER.warning(f"Error: {db_exception} in accessing database.")
+
+        try:
+            collect_rhoai_must_gather(
+                since=calculate_must_gather_timer(test_start_time=test_start_time),
+                target_dir=os.path.join(get_must_gather_collector_dir(), "pytest_exception_interact"),
+            )
+
+        except Exception as current_exception:
+            LOGGER.warning(f"Failed to collect logs: {test_name}: {current_exception} {traceback.format_exc()}")
diff --git a/pyproject.toml b/pyproject.toml
@@ -65,6 +65,7 @@ dependencies = [
     "jira>=3.8.0",
     "openshift-python-wrapper>=11.0.50",
     "semver>=3.0.4",
+    "sqlalchemy>=2.0.40",
     "pytest-order>=1.3.0",
     "marshmallow==3.26.1,<4", # this version is needed for pytest-jira
 ]
diff --git a/tests/global_config.py b/tests/global_config.py
@@ -3,9 +3,9 @@
 distribution: str = "downstream"
 applications_namespace: str = "redhat-ods-applications"  # overwritten in conftest.py if distribution is upstream
 dsc_name: str = "default-dsc"
+must_gather_base_dir: str = "must-gather-base-dir"
 dsci_name: str = "default-dsci"
 dependent_operators: str = "servicemeshoperator,authorino-operator,serverless-operator"
-
 use_unprivileged_client: bool = True
 
 for _dir in dir():
diff --git a/utilities/constants.py b/utilities/constants.py
@@ -272,3 +272,5 @@ class RunTimeConfig:
     },
     "commands": {"GRPC": "vllm_tgis_adapter"},
 }
+
+RHOAI_OPERATOR_NAMESPACE = "redhat-ods-operator"
diff --git a/utilities/database.py b/utilities/database.py
@@ -0,0 +1,53 @@
+import logging
+import os
+
+from sqlalchemy import Integer, String, create_engine
+from sqlalchemy.orm import Mapped, Session, mapped_column
+from sqlalchemy.orm import DeclarativeBase
+from utilities.must_gather_collector import get_base_dir
+
+LOGGER = logging.getLogger(__name__)
+
+TEST_DB = "opendatahub-tests.db"
+
+
+class Base(DeclarativeBase):
+    pass
+
+
+class OpenDataHubTestTable(Base):
+    __tablename__ = "OpenDataHubTestTable"
+
+    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True, nullable=False)
+    test_name: Mapped[str] = mapped_column(String(500))
+    start_time: Mapped[int] = mapped_column(Integer, nullable=False)
+
+
+class Database:
+    def __init__(self, database_file_name: str = TEST_DB, verbose: bool = True) -> None:
+        self.database_file_path = os.path.join(get_base_dir(), database_file_name)
+        self.connection_string = f"sqlite:///{self.database_file_path}"
+        self.verbose = verbose
+        self.engine = create_engine(url=self.connection_string, echo=self.verbose)
+        Base.metadata.create_all(bind=self.engine)
+
+    def insert_test_start_time(self, test_name: str, start_time: int) -> None:
+        with Session(bind=self.engine) as db_session:
+            new_table_entry = OpenDataHubTestTable(test_name=test_name, start_time=start_time)
+            db_session.add(new_table_entry)
+            db_session.commit()
+
+    def get_test_start_time(self, test_name: str) -> int:
+        with Session(bind=self.engine) as db_session:
+            result_row = (
+                db_session.query(OpenDataHubTestTable)
+                .with_entities(OpenDataHubTestTable.start_time)
+                .filter_by(test_name=test_name)
+                .first()
+            )
+            if result_row:
+                start_time_value = result_row[0]
+            else:
+                start_time_value = 0
+                LOGGER.warning(f"No test found with name: {test_name}")
+            return start_time_value
diff --git a/utilities/exceptions.py b/utilities/exceptions.py
@@ -96,6 +96,12 @@ def __str__(self) -> str:
         return f"Failed to log in as user {self.user}."
 
 
+class InvalidArgumentsError(Exception):
+    """Raised when mutually exclusive or invalid argument combinations are passed."""
+
+    pass
+
+
 class ResourceNotReadyError(Exception):
     pass
 
diff --git a/utilities/infra.py b/utilities/infra.py
@@ -1,9 +1,13 @@
+import base64
 import json
+import os
 import re
 import shlex
+import tempfile
 from contextlib import contextmanager
 from functools import cache
-from typing import Any, Callable, Generator, Optional, Set
+from typing import Any, Generator, Optional, Set, Callable
+from json import JSONDecodeError
 
 import kubernetes
 import pytest
@@ -45,7 +49,8 @@
 from semver import Version
 from simple_logger.logger import get_logger
 
-from utilities.constants import ApiGroups, Labels, Timeout
+from ocp_resources.subscription import Subscription
+from utilities.constants import ApiGroups, Labels, Timeout, RHOAI_OPERATOR_NAMESPACE
 from utilities.constants import KServeDeploymentType
 from utilities.constants import Annotations
 from utilities.exceptions import (
@@ -851,6 +856,36 @@ def wait_for_isvc_pods(client: DynamicClient, isvc: InferenceService, runtime_na
     return get_pods_by_isvc_label(client=client, isvc=isvc, runtime_name=runtime_name)
 
 
+def get_rhods_subscription() -> Subscription | None:
+    subscriptions = Subscription.get(dyn_client=get_client(), namespace=RHOAI_OPERATOR_NAMESPACE)
+    if subscriptions:
+        for subscription in subscriptions:
+            LOGGER.info(f"Checking subscription {subscription.name}")
+            if subscription.name.startswith(tuple(["rhods-operator", "rhoai-operator"])):
+                return subscription
+
+    LOGGER.warning("No RHOAI subscription found. Potentially ODH cluster")
+    return None
+
+
+def get_rhods_operator_installed_csv() -> ClusterServiceVersion | None:
+    subscription = get_rhods_subscription()
+    if subscription:
+        csv_name = subscription.instance.status.installedCSV
+        LOGGER.info(f"Expected CSV: {csv_name}")
+        return ClusterServiceVersion(name=csv_name, namespace=RHOAI_OPERATOR_NAMESPACE, ensure_exists=True)
+    return None
+
+
+def get_rhods_csv_version() -> Version | None:
+    rhoai_csv = get_rhods_operator_installed_csv()
+    if rhoai_csv:
+        LOGGER.info(f"RHOAI CSV version: {rhoai_csv.instance.spec.version}")
+        return Version.parse(version=rhoai_csv.instance.spec.version)
+    LOGGER.warning("No RHOAI CSV found. Potentially ODH cluster")
+    return None
+
+
 @retry(
     wait_timeout=120,
     sleep=5,
@@ -930,3 +965,53 @@ def verify_cluster_sanity(
 
         # TODO: Write to file to easily report the failure in jenkins
         pytest.exit(reason=error_msg, returncode=return_code)
+
+
+def get_openshift_pull_secret(client: DynamicClient = None) -> Secret:
+    openshift_config_namespace = "openshift-config"
+    pull_secret_name = "pull-secret"  # pragma: allowlist secret
+    secret = Secret(
+        client=client or get_client(),
+        name=pull_secret_name,
+        namespace=openshift_config_namespace,
+    )
+    assert secret.exists, f"Pull-secret {pull_secret_name} not found in namespace {openshift_config_namespace}"
+    return secret
+
+
+def generate_openshift_pull_secret_file(client: DynamicClient = None) -> str:
+    pull_secret = get_openshift_pull_secret(client=client)
+    pull_secret_path = tempfile.mkdtemp(suffix="odh-pull-secret")
+    json_file = os.path.join(pull_secret_path, "pull-secrets.json")
+    secret = base64.b64decode(pull_secret.instance.data[".dockerconfigjson"]).decode(encoding="utf-8")
+    with open(file=json_file, mode="w") as outfile:
+        outfile.write(secret)
+    return json_file
+
+
+def get_oc_image_info(
+    image: str,
+    architecture: str,
+    pull_secret: str | None = None,
+) -> Any:
+    def _get_image_json(cmd: str) -> Any:
+        return json.loads(run_command(command=shlex.split(cmd), check=False)[1])
+
+    base_command = f"oc image -o json info {image} --filter-by-os {architecture}"
+    if pull_secret:
+        base_command = f"{base_command} --registry-config={pull_secret}"
+
+    sample = None
+    try:
+        for sample in TimeoutSampler(
+            wait_timeout=10,
+            sleep=5,
+            exceptions_dict={JSONDecodeError: [], TypeError: []},
+            func=_get_image_json,
+            cmd=base_command,
+        ):
+            if sample:
+                return sample
+    except TimeoutExpiredError:
+        LOGGER.error(f"Failed to parse {base_command}")
+        raise
diff --git a/utilities/must_gather_collector.py b/utilities/must_gather_collector.py
diff --git a/uv.lock b/uv.lock

Original file line number	Diff line number	Diff line change
`@@ -65,6 +65,7 @@ dependencies = [`
`65`	`65`	`"jira>=3.8.0",`
`66`	`66`	`"openshift-python-wrapper>=11.0.50",`
`67`	`67`	`"semver>=3.0.4",`
	`68`	`+ "sqlalchemy>=2.0.40",`
`68`	`69`	`"pytest-order>=1.3.0",`
`69`	`70`	`"marshmallow==3.26.1,<4", # this version is needed for pytest-jira`
`70`	`71`	`]`
Original file line number	Diff line number	Diff line change
`@@ -272,3 +272,5 @@ class RunTimeConfig:`
`272`	`272`	`},`
`273`	`273`	`"commands": {"GRPC": "vllm_tgis_adapter"},`
`274`	`274`	`}`
	`275`	`+`
	`276`	`+RHOAI_OPERATOR_NAMESPACE = "redhat-ods-operator"`