Skip to content

test(evalhub): add multi-tenancy integration tests#1333

Open
ruivieira wants to merge 3 commits intoopendatahub-io:mainfrom
ruivieira:evalhub-tests
Open

test(evalhub): add multi-tenancy integration tests#1333
ruivieira wants to merge 3 commits intoopendatahub-io:mainfrom
ruivieira:evalhub-tests

Conversation

@ruivieira
Copy link
Copy Markdown
Member

@ruivieira ruivieira commented Mar 31, 2026

Pull Request

Summary

Adds comprehensive EvalHub integration tests covering deployment topology, multi-tenancy authorization, job lifecycle, and operator RBAC provisioning.

These tests validate the full multi-tenancy contract: authorised access, cross-tenant denial, and missing tenant rejection across all API endpoints. They also verify the operator correctly provisions RBAC in tenant namespaces and that evaluation jobs complete end-to-end using the vLLM emulator.

(Note: EvalHub CRD has changed and needs to be regenerated for this PR)

Summary of tests added

Deployment topology (smoke)

Test What it verifies
test_evalhub_single_pod_single_container 1 pod, 1 container named evalhub, correct labels (app, instance, component)

Health (smoke)

Test What it verifies
test_evalhub_health_endpoint Health returns healthy status (existing)
test_evalhub_health_is_tenant_agnostic Health works without X-Tenant; ignores it when present

Multi-tenancy — providers (3 scenarios)

Test Expected
test_providers_authorized_tenant 200 + non-empty list
test_providers_cross_tenant_forbidden denied (400/403)
test_providers_missing_tenant_rejected 400

Multi-tenancy — collections listing (3 scenarios)

Test Expected
test_collections_authorized_tenant 200 + valid items list
test_collections_cross_tenant_denied denied (400/403)
test_collections_missing_tenant_rejected 400

Multi-tenancy — collection creation (3 scenarios)

Test Expected
test_create_collection_authorized_tenant 201 + resource ID
test_create_collection_cross_tenant_denied denied (400/403)
test_create_collection_missing_tenant_rejected 400

Multi-tenancy — job submission and lifecycle (7 tests)

Test Expected
test_job_submit_authorized_tenant 202 + job ID + pending state
test_job_completion_and_results completed state + benchmark metrics
test_job_pod_reaches_succeeded K8s pod phase Succeeded
test_job_submit_cross_tenant_denied denied (400/403)
test_job_submit_missing_tenant_rejected 400
test_job_list_authorized_tenant submitted job visible in list
test_job_list_cross_tenant_denied denied (400/403)
test_job_list_missing_tenant_rejected 400

Multi-tenancy — job deletion (3 scenarios)

Test Expected
test_job_delete_authorized_tenant 200/204
test_job_delete_cross_tenant_denied denied (400/403)
test_job_delete_missing_tenant_rejected 400

Operator RBAC provisioning (4 tests)

Test What it checks
test_job_service_account_created Job SA exists in tenant namespace
test_jobs_writer_role_binding RoleBinding references correct ClusterRole, subject is API SA from instance namespace
test_job_config_role_binding Same verification for job-config ClusterRole
test_service_ca_configmap_created CA ConfigMap exists with inject-cabundle annotation

Please review and indicate how it has been tested

  • Locally
  • Jenkins

Additional Requirements

  • If this PR introduces a new test image, did you create a PR to mirror it in disconnected environment?
  • If this PR introduces new marker(s)/adds a new component, was relevant ticket created to update relevant Jenkins job?

Summary by CodeRabbit

  • Tests

    • Added comprehensive multi-tenancy test suite covering tenant isolation, RBAC, authorized/denied flows, job lifecycle (submit, complete, delete), collections and providers endpoints, deployment topology, and health checks.
  • Chores

    • Enhanced test fixtures and utilities for multi-tenant scenarios (tokens, namespaces, emulator service, polling helpers).
    • Added optional EvalHub database configuration support.

@github-actions
Copy link
Copy Markdown

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/hold', '/verified', '/wip', '/cherry-pick', '/lgtm', '/build-push-pr-image'}

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Warning

Rate limit exceeded

@ruivieira has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 4 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 6 minutes and 4 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: acc06b74-ca6a-4fe1-b70f-31cb8fd31e03

📥 Commits

Reviewing files that changed from the base of the PR and between add0f4c and 48b90ca.

📒 Files selected for processing (1)
  • tests/model_explainability/evalhub/utils.py
📝 Walkthrough

Walkthrough

Adds EvalHub multi-tenancy test infrastructure and utilities: new multi-tenant fixtures, RBAC provisioning, vLLM emulator deployment, tenant-scoped tests for jobs/collections/providers, expanded request/job helper APIs, and a small EvalHub resource change to accept a database spec.

Changes

Cohort / File(s) Summary
EvalHub resource
utilities/resources/evalhub.py
Added optional database ctor parameter and include spec.database in to_dict() when provided.
Core test constants
tests/model_explainability/evalhub/constants.py
Added endpoint paths, container/component identifiers, tenant label key, multi‑tenant CR name, emulator port, and ClusterRole name constants.
Global evalhub fixture tweak
tests/model_explainability/evalhub/conftest.py
evalhub_cr fixture now passes database={"type": "sqlite"} when creating the EvalHub CR.
Multi‑tenancy fixtures
tests/model_explainability/evalhub/multitenancy/conftest.py
New class‑scoped fixtures provisioning evalhub-mt CR, tenant namespaces, ServiceAccounts/Roles/RoleBindings, RBAC readiness polling, bearer token helper, vLLM emulator Deployment and Service, and CA bundle extraction.
Multi‑tenant tests — endpoints & flows
tests/model_explainability/evalhub/multitenancy/test_evalhub_collections_mt.py, .../test_evalhub_create_collection_mt.py, .../test_evalhub_jobs_mt.py, .../test_evalhub_job_delete_mt.py, .../test_evalhub_providers_mt.py, .../test_evalhub_tenant_rbac.py
New test suites covering authorized access, cross‑tenant denial, missing‑tenant rejection for collections, creation, jobs (submit/wait/delete), providers, and RBAC artifacts verification.
Deployment & health checks
tests/model_explainability/evalhub/test_evalhub_deployment.py, tests/model_explainability/evalhub/test_evalhub_health.py
New deployment topology test (single pod/single container/labels); health test refactored to assert tenant‑agnostic health endpoint behavior.
HTTP/job utilities
tests/model_explainability/evalhub/utils.py
Renamed header builder to build_headers; added request validators for denied/missing‑tenant GET/POST/DELETE, job submission/polling/completion helpers, list/delete APIs, collection submit helper, and EVALHUB_JOB_TERMINAL_STATES.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Actionable security findings (direct, actionable, cite CWE):

  • CWE-20 — EvalHub database parameter accepted without validation. Validate schema/keys/types and restrict allowed values to prevent misconfiguration or injection of unexpected fields into CR spec.
  • CWE-942 / CWE-269 — RBAC artifacts and ClusterRole bindings are created programmatically in tests; ensure tests cannot inadvertently grant elevated privileges in shared clusters and that created namespaced RoleBindings cannot be escalated to ClusterRole abuse. Consider scoping to minimal privileges and tearing down reliably.
  • CWE-613 — Job polling with long default timeout (600s) may hide hangs or be used to prolong test runtime; make timeout configurable and fail fast in CI if appropriate.
  • Header injection risk (related to CWE-74): tenant names used directly in X-Tenant header — sanitize/validate against an allowed pattern to avoid header injection or proxy routing surprises.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test(evalhub): add multi-tenancy integration tests' directly and clearly describes the primary change—adding multi-tenancy integration tests for EvalHub—matching the substantial test additions across all file changes.
Description check ✅ Passed The PR description comprehensively covers the summary, test organization by feature area, expected behaviors for each scenario, and mentions local testing verification. While the 'Related Issues' and test image sections are incomplete, the core content fulfills the primary documentation requirement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py (2)

210-210: Same dict access pattern issue.

Apply consistent defensive access here as well.

Proposed fix
-        job_id = submitted["resource"]["id"]
+        job_id = submitted.get("resource", {}).get("id")
+        assert job_id, f"Submitted job missing 'resource.id': {submitted}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py` at
line 210, Replace the direct dict indexing job_id = submitted["resource"]["id"]
with a defensive access pattern to avoid KeyError—e.g., extract job_id =
submitted.get("resource", {}).get("id") or check/assert that "resource" in
submitted and "id" in submitted["resource"] before assigning; ensure any
subsequent assertions or usage of job_id handle a missing value appropriately.

86-86: Inconsistent dict access pattern.

Line 61 uses defensive .get() but line 86 uses direct key access. If the API response schema changes, this raises an unclear KeyError instead of a descriptive assertion failure.

Proposed fix
-        job_id = data["resource"]["id"]
+        job_id = data.get("resource", {}).get("id")
+        assert job_id, f"Job response missing 'resource.id': {data}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py` at
line 86, The test inconsistently accesses the API response: instead of directly
using data["resource"]["id"] (variable job_id), change it to a defensive access
or explicit assertion—e.g., assert that "resource" and "id" exist on data (with
a clear message) and then use data.get("resource", {}).get("id") to set job_id;
update use sites around job_id and the test function name (test_evalhub_jobs_mt)
to rely on this defensive pattern so a schema change yields a clear assertion
error instead of a KeyError.
tests/model_explainability/evalhub/multitenancy/conftest.py (1)

154-163: TimeoutSampler loop does not raise on timeout path.

If _tenant_rbac_ready never returns True, TimeoutSampler raises TimeoutExpiredError. The function signature returns None, which is correct for the happy path, but the docstring should clarify the exception behavior for callers.

Proposed docstring addition
 def tenant_a_rbac_ready(
     admin_client: DynamicClient,
     tenant_a_namespace: Namespace,
     evalhub_mt_deployment: Deployment,
 ) -> None:
     """Wait for the operator to provision job RBAC in tenant-a.

     The operator watches for namespaces with the tenant label and
     creates jobs-writer + job-config RoleBindings. This fixture
     blocks until those RoleBindings exist.
+
+    Raises:
+        TimeoutExpiredError: If RBAC is not provisioned within 120s.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/multitenancy/conftest.py` around lines 154
- 163, The docstring for the function using TimeoutSampler should be updated to
document the timeout/exception behavior: mention that TimeoutSampler will raise
TimeoutExpiredError if _tenant_rbac_ready never returns True within wait_timeout
(120s in this usage), and that the function returns None on the happy path but
will propagate/raise TimeoutExpiredError on timeout; update the docstring where
TimeoutSampler is used (referencing TimeoutSampler and _tenant_rbac_ready) to
explicitly state this exception and its conditions so callers know to handle it.
tests/model_explainability/evalhub/utils.py (2)

352-355: Unreachable code after TimeoutSampler loop.

TimeoutSampler raises TimeoutExpiredError when the timeout is exceeded without finding a match. The return {} on line 355 is unreachable. While harmless, it's misleading about the function's behavior.

Proposed fix
     for sample in TimeoutSampler(
         wait_timeout=timeout,
         sleep=sleep,
         func=_get_job_status,
         host=host,
         token=token,
         ca_bundle_file=ca_bundle_file,
         tenant=tenant,
         job_id=job_id,
     ):
         state = sample.get("status", {}).get("state", "")
         LOGGER.info(f"Job {job_id} state: {state}")
         if state in EVALHUB_JOB_TERMINAL_STATES:
             return sample
-
-    return {}
+    # TimeoutSampler raises TimeoutExpiredError if we never return above
+    raise AssertionError("TimeoutSampler exited without timeout or match")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/utils.py` around lines 352 - 355, The
trailing "return {}" after the TimeoutSampler loop is unreachable because
TimeoutSampler raises TimeoutExpiredError on timeout; remove the misleading
return and instead either let the TimeoutExpiredError propagate or explicitly
catch TimeoutExpiredError where this function is called; update the code around
the TimeoutSampler usage (referencing TimeoutSampler, TimeoutExpiredError and
EVALHUB_JOB_TERMINAL_STATES) to eliminate the dead "return {}" and ensure
callers handle the exception if needed.

174-174: Missing generic type parameters on dict.

Per pyproject.toml mypy settings (disallow_any_generics), dict requires explicit type parameters.

Proposed fix
+from typing import Any
+
 def submit_evalhub_job(
     host: str,
     token: str,
     ca_bundle_file: str,
     tenant: str,
-    payload: dict,
+    payload: dict[str, Any],
 ) -> dict[str, Any]:

Apply the same pattern to validate_evalhub_post_denied (line 214), validate_evalhub_post_no_tenant (line 249), build_evalhub_job_payload (line 491), and submit_evalhub_collection (line 527).

As per coding guidelines, mypy options require disallow_any_generics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/utils.py` at line 174, Change
unparameterized use of dict to explicit typing to satisfy disallow_any_generics:
update the function signatures for validate_evalhub_post,
validate_evalhub_post_denied, validate_evalhub_post_no_tenant,
build_evalhub_job_payload, and submit_evalhub_collection to use typed dicts such
as Dict[str, Any] (or a more specific Mapping/TypedDict if appropriate) for
parameters named payload (and any other dict-typed params/returns), and add the
necessary imports from typing (e.g., Dict, Any) at the top of
tests/model_explainability/evalhub/utils.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py`:
- Around line 12-22: The variable COLLECTION_PAYLOAD is annotated as bare dict
which violates disallow_any_generics; update its type annotation to include
parameters (for example use Dict[str, Any] or a more specific type like
Dict[str, object]) and add the necessary import (e.g., from typing import Any,
Dict) so the declaration becomes COLLECTION_PAYLOAD: Dict[str, Any] = { ... }
(or replace with a precise typed structure if preferred) to satisfy the linter;
reference COLLECTION_PAYLOAD to locate the change and ensure imports are
added/adjusted accordingly.

In
`@tests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.py`:
- Around line 76-99: The test creates a job via submit_evalhub_job (using
build_evalhub_job_payload) and then calls validate_evalhub_delete_denied, but
never cleans up the created job; wrap the submission and denial assertion in a
try/finally (or ensure a teardown step) and in the finally call the
service-level deletion helper (e.g., delete_evalhub_job) using tenant-a
credentials (token=tenant_a_token, tenant=tenant_a_namespace.name,
host=evalhub_mt_route.host, ca_bundle_file=evalhub_mt_ca_bundle_file,
job_id=job_id) and ignore non-fatal errors so the job is always removed after
the negative-path check.
- Around line 46-47: Replace the fixed job_name="evalhub-delete-test-job" with a
unique name generator in the test (e.g., append a short UUID or timestamp) so
each test run creates distinct jobs; update the job_name parameter used in the
test functions (the job_name argument passed when creating the job) and apply
the same change to the other occurrences noted so tests at lines ~80-81 and
~113-114 also use unique names to avoid collisions in reruns/parallel execution
and to prevent masking auth failures.

In `@tests/model_explainability/evalhub/test_evalhub_deployment.py`:
- Around line 28-33: The test function test_evalhub_single_pod_single_container
has an untyped fixture parameter admin_client; add a type annotation for
admin_client (e.g., admin_client: Any or the specific fixture type) and import
the type (from typing import Any) or the concrete AdminClient type so the
signature def test_evalhub_single_pod_single_container(self, admin_client: Any,
model_namespace: Namespace, evalhub_deployment: Deployment) -> None satisfies
disallow_untyped_defs.

In `@tests/model_explainability/evalhub/utils.py`:
- Around line 203-204: The log is pulling the wrong key from the HTTP response:
change the LOGGER.info call that currently uses data.get('id') to reference the
nested id at data["resource"]["id"] (locate the response -> data assignment and
the LOGGER.info line that logs "Job submitted: ..."); ensure you safely access
the nested key (handle missing resource or id) and log a sensible fallback like
"unknown" if the nested id is absent.

---

Nitpick comments:
In `@tests/model_explainability/evalhub/multitenancy/conftest.py`:
- Around line 154-163: The docstring for the function using TimeoutSampler
should be updated to document the timeout/exception behavior: mention that
TimeoutSampler will raise TimeoutExpiredError if _tenant_rbac_ready never
returns True within wait_timeout (120s in this usage), and that the function
returns None on the happy path but will propagate/raise TimeoutExpiredError on
timeout; update the docstring where TimeoutSampler is used (referencing
TimeoutSampler and _tenant_rbac_ready) to explicitly state this exception and
its conditions so callers know to handle it.

In `@tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py`:
- Line 210: Replace the direct dict indexing job_id =
submitted["resource"]["id"] with a defensive access pattern to avoid
KeyError—e.g., extract job_id = submitted.get("resource", {}).get("id") or
check/assert that "resource" in submitted and "id" in submitted["resource"]
before assigning; ensure any subsequent assertions or usage of job_id handle a
missing value appropriately.
- Line 86: The test inconsistently accesses the API response: instead of
directly using data["resource"]["id"] (variable job_id), change it to a
defensive access or explicit assertion—e.g., assert that "resource" and "id"
exist on data (with a clear message) and then use data.get("resource",
{}).get("id") to set job_id; update use sites around job_id and the test
function name (test_evalhub_jobs_mt) to rely on this defensive pattern so a
schema change yields a clear assertion error instead of a KeyError.

In `@tests/model_explainability/evalhub/utils.py`:
- Around line 352-355: The trailing "return {}" after the TimeoutSampler loop is
unreachable because TimeoutSampler raises TimeoutExpiredError on timeout; remove
the misleading return and instead either let the TimeoutExpiredError propagate
or explicitly catch TimeoutExpiredError where this function is called; update
the code around the TimeoutSampler usage (referencing TimeoutSampler,
TimeoutExpiredError and EVALHUB_JOB_TERMINAL_STATES) to eliminate the dead
"return {}" and ensure callers handle the exception if needed.
- Line 174: Change unparameterized use of dict to explicit typing to satisfy
disallow_any_generics: update the function signatures for validate_evalhub_post,
validate_evalhub_post_denied, validate_evalhub_post_no_tenant,
build_evalhub_job_payload, and submit_evalhub_collection to use typed dicts such
as Dict[str, Any] (or a more specific Mapping/TypedDict if appropriate) for
parameters named payload (and any other dict-typed params/returns), and add the
necessary imports from typing (e.g., Dict, Any) at the top of
tests/model_explainability/evalhub/utils.py.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: bb9f5549-6910-4d2c-a8b4-cbc29f319a66

📥 Commits

Reviewing files that changed from the base of the PR and between 8968196 and f3366a1.

📒 Files selected for processing (14)
  • tests/model_explainability/evalhub/conftest.py
  • tests/model_explainability/evalhub/constants.py
  • tests/model_explainability/evalhub/multitenancy/__init__.py
  • tests/model_explainability/evalhub/multitenancy/conftest.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_collections_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_providers_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_tenant_rbac.py
  • tests/model_explainability/evalhub/test_evalhub_deployment.py
  • tests/model_explainability/evalhub/test_evalhub_health.py
  • tests/model_explainability/evalhub/utils.py
  • utilities/resources/evalhub.py

Comment on lines +12 to +22
COLLECTION_PAYLOAD: dict = {
"name": "MT Test Collection",
"category": "test",
"description": "Collection created by multi-tenancy integration test",
"benchmarks": [
{
"id": "arc_easy",
"provider_id": "lm_evaluation_harness",
}
],
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify bare dict annotation in this file.
rg -nP '^\s*COLLECTION_PAYLOAD\s*:\s*dict\s*=' tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py

Repository: opendatahub-io/opendatahub-tests

Length of output: 107


🏁 Script executed:

cat -n pyproject.toml | head -100

Repository: opendatahub-io/opendatahub-tests

Length of output: 2579


🏁 Script executed:

rg -n 'disallow_any_generics' pyproject.toml

Repository: opendatahub-io/opendatahub-tests

Length of output: 107


🏁 Script executed:

cat -n tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py

Repository: opendatahub-io/opendatahub-tests

Length of output: 3963


Add type parameters to COLLECTION_PAYLOAD to satisfy disallow_any_generics.

Line 12 uses bare dict without type parameters, violating the disallow_any_generics setting enabled in pyproject.toml.

Fix
+from typing import TypedDict
+
+class BenchmarkPayload(TypedDict):
+    id: str
+    provider_id: str
+
+class CollectionPayload(TypedDict):
+    name: str
+    category: str
+    description: str
+    benchmarks: list[BenchmarkPayload]
+
-COLLECTION_PAYLOAD: dict = {
+COLLECTION_PAYLOAD: CollectionPayload = {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py`
around lines 12 - 22, The variable COLLECTION_PAYLOAD is annotated as bare dict
which violates disallow_any_generics; update its type annotation to include
parameters (for example use Dict[str, Any] or a more specific type like
Dict[str, object]) and add the necessary import (e.g., from typing import Any,
Dict) so the declaration becomes COLLECTION_PAYLOAD: Dict[str, Any] = { ... }
(or replace with a precise typed structure if preferred) to satisfy the linter;
reference COLLECTION_PAYLOAD to locate the change and ensure imports are
added/adjusted accordingly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai this won't trigger a CI failure since mypy explicitly excludes all test files (exclude: ^(docs/|.test..py$|...)), so disallow_any_generics doesn't apply here. Pre-commit mypy passes cleanly

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruivieira, thanks for the clarification! You're right — since mypy excludes all test files via exclude: ^(docs/|.*test.*\.py$|...), the disallow_any_generics constraint doesn't apply to test files, and the bare dict annotation on COLLECTION_PAYLOAD is perfectly fine. I'll disregard this finding.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Comment on lines +46 to +47
job_name="evalhub-delete-test-job",
)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop reusing a fixed job name across tests.

Using job_name="evalhub-delete-test-job" in multiple tests creates collision risk on reruns/parallel execution and can mask auth checks behind name-conflict failures.

Proposed fix
+from uuid import uuid4
@@
-            job_name="evalhub-delete-test-job",
+            job_name=f"evalhub-delete-test-job-{uuid4().hex[:8]}",
@@
-            job_name="evalhub-delete-test-job",
+            job_name=f"evalhub-delete-test-job-{uuid4().hex[:8]}",
@@
-            job_name="evalhub-delete-test-job",
+            job_name=f"evalhub-delete-test-job-{uuid4().hex[:8]}",

As per coding guidelines: REVIEW PRIORITIES: ... Bug-prone patterns and error handling gaps.

Also applies to: 80-81, 113-114

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.py`
around lines 46 - 47, Replace the fixed job_name="evalhub-delete-test-job" with
a unique name generator in the test (e.g., append a short UUID or timestamp) so
each test run creates distinct jobs; update the job_name parameter used in the
test functions (the job_name argument passed when creating the job) and apply
the same change to the other occurrences noted so tests at lines ~80-81 and
~113-114 also use unique names to avoid collisions in reruns/parallel execution
and to prevent masking auth failures.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Each test submits a new job and gets back a unique server-generated job_id (UUID) from the response. The job_name is just a human-readable label in the EvalHub API, not a unique key. The actual identifier used for deletion is data["resource"]["id"] which is a UUID like 96e73ffe-49ba-4b39-9053-3dce1de6f0c5, guaranteed unique per submission.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruivieira, thanks for the clarification — that makes sense. Since the EvalHub API uses a server-generated UUID (data["resource"]["id"]) as the actual job identifier and job_name is purely a human-readable label with no uniqueness enforcement, there's no collision risk here. My concern was unfounded.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

  Add 24 integration tests for EvalHub covering deployment topology,
  health, multi-tenancy authorisation (providers, collections, jobs),
  job lifecycle with vLLM emulator, and operator RBAC provisioning.

Signed-off-by: Rui Vieira <ruidevieira@googlemail.com>
Signed-off-by: Rui Vieira <ruidevieira@googlemail.com>
Signed-off-by: Rui Vieira <ruidevieira@googlemail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
tests/model_explainability/evalhub/utils.py (1)

203-204: ⚠️ Potential issue | 🟡 Minor

Use the nested job ID in the submission log.

Callers read data["resource"]["id"], so this line logs "unknown" on successful submissions. That removes the only stable handle from later wait/delete failures.

Suggested fix
-    LOGGER.info(f"Job submitted: {data.get('id', 'unknown')}")
+    LOGGER.info(f"Job submitted: {data.get('resource', {}).get('id', 'unknown')}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/utils.py` around lines 203 - 204, The
submission log uses data.get('id') but callers expect the nested job id at
data["resource"]["id"], so update the LOGGER.info call that follows data =
response.json() to extract the nested id (e.g., data.get('resource',
{}).get('id', 'unknown')) and log that value instead; locate the LOGGER.info
invocation in tests/model_explainability/evalhub/utils.py and replace the
referenced key accordingly so the stable job handle used by wait/delete matches
the logged id.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/model_explainability/evalhub/multitenancy/conftest.py`:
- Around line 131-163: Extend the readiness check so the tenant_a_rbac_ready
fixture waits for the operator-created ServiceAccount and service CA ConfigMap
as well as the two RoleBindings: update _tenant_rbac_ready to also list/check
for a ServiceAccount whose name startswith EVALHUB_MT_CR_NAME and contains "job"
(or the operator's job-SA naming) and a ConfigMap whose name startswith
EVALHUB_MT_CR_NAME and contains "service-ca"; return true only when
has_job_config, has_job_writer, the ServiceAccount exists, and the ConfigMap
exists so tenant_a_rbac_ready only unblocks once all RBAC-related resources are
present.
- Around line 100-123: The fixtures tenant_a_namespace and tenant_b_namespace
currently hardcode namespace names causing collisions; update both fixtures to
generate unique names per test class (e.g., incorporate the pytest request
identifier or request.cls.__name__ plus a short random suffix/uuid) before
calling create_ns so each class gets a distinct namespace; ensure you still pass
the generated name into create_ns(name=...) and keep the
EVALHUB_TENANT_LABEL_KEY label usage unchanged.

In `@tests/model_explainability/evalhub/utils.py`:
- Around line 232-240: The helper currently only checks status codes which lets
malformed payloads mask broken auth; update both
validate_evalhub_post_cross_tenant() and validate_evalhub_post_no_tenant() to
parse response.json() and assert the body contains the expected auth-specific
error: for cross-tenant POST verify the JSON error indicates an authorization
failure (e.g., contains "unauthorized", "forbidden", or an auth error
code/field), and for the no-tenant path verify the JSON error specifically
complains about a missing/invalid tenant header (e.g., contains "tenant",
"missing tenant header", or a tenant-header error code/field); keep the
status-code assertion but add these body-level assertions to make the checks
deterministic.
- Around line 438-444: The helpers accept any 400/403 because
delete_evalhub_job() currently returns only status_code; modify
delete_evalhub_job() to return the full HTTP response (or both status_code and
body) and then update validate_evalhub_delete_denied() and
validate_evalhub_delete_no_tenant() to call that function, assert the expected
auth status (e.g. 403 or 400 as appropriate) AND inspect the response body for
an auth-specific error (e.g. response.json().get('error') or response.text
contains authorization/tenant-related messages) so the tests fail when a
non-auth-related 400 is returned.

---

Duplicate comments:
In `@tests/model_explainability/evalhub/utils.py`:
- Around line 203-204: The submission log uses data.get('id') but callers expect
the nested job id at data["resource"]["id"], so update the LOGGER.info call that
follows data = response.json() to extract the nested id (e.g.,
data.get('resource', {}).get('id', 'unknown')) and log that value instead;
locate the LOGGER.info invocation in tests/model_explainability/evalhub/utils.py
and replace the referenced key accordingly so the stable job handle used by
wait/delete matches the logged id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 7c5b3639-f3af-4c6c-9b89-729a09a3f252

📥 Commits

Reviewing files that changed from the base of the PR and between f3366a1 and add0f4c.

📒 Files selected for processing (14)
  • tests/model_explainability/evalhub/conftest.py
  • tests/model_explainability/evalhub/constants.py
  • tests/model_explainability/evalhub/multitenancy/__init__.py
  • tests/model_explainability/evalhub/multitenancy/conftest.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_collections_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_providers_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_tenant_rbac.py
  • tests/model_explainability/evalhub/test_evalhub_deployment.py
  • tests/model_explainability/evalhub/test_evalhub_health.py
  • tests/model_explainability/evalhub/utils.py
  • utilities/resources/evalhub.py
✅ Files skipped from review due to trivial changes (3)
  • tests/model_explainability/evalhub/conftest.py
  • tests/model_explainability/evalhub/test_evalhub_deployment.py
  • tests/model_explainability/evalhub/constants.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_collections_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py
  • utilities/resources/evalhub.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_providers_mt.py
  • tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py

Comment on lines +100 to +123
@pytest.fixture(scope="class")
def tenant_a_namespace(
admin_client: DynamicClient,
) -> Generator[Namespace, Any, Any]:
"""Tenant namespace where the test user HAS access."""
with create_ns(
admin_client=admin_client,
name="test-evalhub-tenant-a",
labels={EVALHUB_TENANT_LABEL_KEY: "true"},
) as ns:
yield ns


@pytest.fixture(scope="class")
def tenant_b_namespace(
admin_client: DynamicClient,
) -> Generator[Namespace, Any, Any]:
"""Tenant namespace where the test user does NOT have access."""
with create_ns(
admin_client=admin_client,
name="test-evalhub-tenant-b",
labels={EVALHUB_TENANT_LABEL_KEY: "true"},
) as ns:
yield ns
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Generate tenant namespaces per test class.

These fixtures are class-scoped but hardcode test-evalhub-tenant-a and test-evalhub-tenant-b. This conftest.py is shared by multiple new multitenancy modules, so parallel classes—or a class starting before the previous namespace fully leaves Terminating—will contend on the same namespaces and all fixed resource names inside them.

Suggested fix
 `@pytest.fixture`(scope="class")
 def tenant_a_namespace(
     admin_client: DynamicClient,
+    model_namespace: Namespace,
 ) -> Generator[Namespace, Any, Any]:
@@
-        name="test-evalhub-tenant-a",
+        name=f"{model_namespace.name}-tenant-a",
         labels={EVALHUB_TENANT_LABEL_KEY: "true"},
     ) as ns:
         yield ns
@@
 `@pytest.fixture`(scope="class")
 def tenant_b_namespace(
     admin_client: DynamicClient,
+    model_namespace: Namespace,
 ) -> Generator[Namespace, Any, Any]:
@@
-        name="test-evalhub-tenant-b",
+        name=f"{model_namespace.name}-tenant-b",
         labels={EVALHUB_TENANT_LABEL_KEY: "true"},
     ) as ns:
         yield ns
As per coding guidelines: `**`: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/multitenancy/conftest.py` around lines 100
- 123, The fixtures tenant_a_namespace and tenant_b_namespace currently hardcode
namespace names causing collisions; update both fixtures to generate unique
names per test class (e.g., incorporate the pytest request identifier or
request.cls.__name__ plus a short random suffix/uuid) before calling create_ns
so each class gets a distinct namespace; ensure you still pass the generated
name into create_ns(name=...) and keep the EVALHUB_TENANT_LABEL_KEY label usage
unchanged.

Comment on lines +131 to +163
def _tenant_rbac_ready(admin_client: DynamicClient, namespace: str) -> bool:
"""Check if the operator has provisioned job RBAC for the test EvalHub instance."""
rbs = list(RoleBinding.get(client=admin_client, namespace=namespace))
rb_names = [rb.name for rb in rbs]
# Look for RoleBindings prefixed with the test instance name to avoid
# matching RoleBindings from the production EvalHub instance.
has_job_config = any(name.startswith(EVALHUB_MT_CR_NAME) and "job-config" in name for name in rb_names)
has_job_writer = any(name.startswith(EVALHUB_MT_CR_NAME) and "job-writer" in name for name in rb_names)
return has_job_config and has_job_writer


@pytest.fixture(scope="class")
def tenant_a_rbac_ready(
admin_client: DynamicClient,
tenant_a_namespace: Namespace,
evalhub_mt_deployment: Deployment,
) -> None:
"""Wait for the operator to provision job RBAC in tenant-a.

The operator watches for namespaces with the tenant label and
creates jobs-writer + job-config RoleBindings. This fixture
blocks until those RoleBindings exist.
"""
for ready in TimeoutSampler(
wait_timeout=120,
sleep=5,
func=_tenant_rbac_ready,
admin_client=admin_client,
namespace=tenant_a_namespace.name,
):
if ready:
LOGGER.info(f"Operator RBAC provisioned in {tenant_a_namespace.name}")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wait for every resource the RBAC tests assert.

tenant_a_rbac_ready returns once the two RoleBindings exist, but tests/model_explainability/evalhub/multitenancy/test_evalhub_tenant_rbac.py uses it before checking the operator-created job ServiceAccount and service CA ConfigMap. If those arrive later in the reconcile loop, the suite becomes timing-dependent.

Suggested fix
+from ocp_resources.config_map import ConfigMap
 def _tenant_rbac_ready(admin_client: DynamicClient, namespace: str) -> bool:
     """Check if the operator has provisioned job RBAC for the test EvalHub instance."""
     rbs = list(RoleBinding.get(client=admin_client, namespace=namespace))
     rb_names = [rb.name for rb in rbs]
+    sa_names = [sa.name for sa in ServiceAccount.get(client=admin_client, namespace=namespace)]
     # Look for RoleBindings prefixed with the test instance name to avoid
     # matching RoleBindings from the production EvalHub instance.
     has_job_config = any(name.startswith(EVALHUB_MT_CR_NAME) and "job-config" in name for name in rb_names)
     has_job_writer = any(name.startswith(EVALHUB_MT_CR_NAME) and "job-writer" in name for name in rb_names)
-    return has_job_config and has_job_writer
+    has_job_sa = any(
+        name.startswith(f"{EVALHUB_MT_CR_NAME}-") and name.endswith("-job")
+        for name in sa_names
+    )
+    has_service_ca = ConfigMap(
+        client=admin_client,
+        name=f"{EVALHUB_MT_CR_NAME}-service-ca",
+        namespace=namespace,
+    ).exists
+    return has_job_config and has_job_writer and has_job_sa and has_service_ca
As per coding guidelines: `**/*.py`: Code reuse, test parameterization, proper test dependency should be also encouraged`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/multitenancy/conftest.py` around lines 131
- 163, Extend the readiness check so the tenant_a_rbac_ready fixture waits for
the operator-created ServiceAccount and service CA ConfigMap as well as the two
RoleBindings: update _tenant_rbac_ready to also list/check for a ServiceAccount
whose name startswith EVALHUB_MT_CR_NAME and contains "job" (or the operator's
job-SA naming) and a ConfigMap whose name startswith EVALHUB_MT_CR_NAME and
contains "service-ca"; return true only when has_job_config, has_job_writer, the
ServiceAccount exists, and the ConfigMap exists so tenant_a_rbac_ready only
unblocks once all RBAC-related resources are present.

Comment on lines +232 to +240
response = requests.post(
url=url,
headers=build_headers(token=token, tenant=tenant),
json=payload,
verify=ca_bundle_file,
timeout=30,
)
assert response.status_code in (400, 403), (
f"Expected 400 or 403 for cross-tenant POST, got {response.status_code}: {response.text}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make POST negative-path helpers verify the auth contract.

A malformed collection/job payload can also return 400, so these helpers can pass while tenant authorization is broken. Parse the body and assert the auth-specific error for cross-tenant requests, and the tenant-header-specific error for the no-tenant path. This weakens the suite's security coverage (CWE-693).

Suggested fix
     assert response.status_code in (400, 403), (
         f"Expected 400 or 403 for cross-tenant POST, got {response.status_code}: {response.text}"
     )
+    data = response.json()
+    assert data.get("message_code") in ("unable_to_authorize_request", "forbidden"), (
+        f"Expected authorization denial, got message_code: {data.get('message_code')}"
+    )

Mirror the same pattern in validate_evalhub_post_no_tenant() with a tenant-header-specific assertion instead of accepting any 400.

As per coding guidelines: `**/*.py`: Focus on security, test structure and coding style adherence in new code introduced`.

Also applies to: 266-273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/utils.py` around lines 232 - 240, The
helper currently only checks status codes which lets malformed payloads mask
broken auth; update both validate_evalhub_post_cross_tenant() and
validate_evalhub_post_no_tenant() to parse response.json() and assert the body
contains the expected auth-specific error: for cross-tenant POST verify the JSON
error indicates an authorization failure (e.g., contains "unauthorized",
"forbidden", or an auth error code/field), and for the no-tenant path verify the
JSON error specifically complains about a missing/invalid tenant header (e.g.,
contains "tenant", "missing tenant header", or a tenant-header error
code/field); keep the status-code assertion but add these body-level assertions
to make the checks deterministic.

Comment on lines +438 to +444
response = requests.delete(
url=url,
headers=build_headers(token=token, tenant=tenant),
verify=ca_bundle_file,
timeout=10,
)
return response.status_code
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make DELETE negative-path helpers verify the auth contract.

validate_evalhub_delete_denied() and validate_evalhub_delete_no_tenant() accept any 400/403 today. A fast job can hit a non-cancellable lifecycle state and return a generic 400, which lets the auth tests pass for the wrong reason. Keep the response body from delete_evalhub_job() and assert an auth-specific error in both helpers. This weakens the suite's security coverage (CWE-693).

Suggested fix
 def delete_evalhub_job(
     host: str,
     token: str,
     ca_bundle_file: str,
     tenant: str,
     job_id: str,
-) -> int:
-    """Delete (cancel) an evaluation job. Returns the HTTP status code."""
+) -> requests.Response:
+    """Delete (cancel) an evaluation job."""
@@
-    return response.status_code
+    return response
-    status = delete_evalhub_job(
+    response = delete_evalhub_job(
         host=host,
         token=token,
         ca_bundle_file=ca_bundle_file,
         tenant=tenant,
         job_id=job_id,
     )
-    assert status in (400, 403), f"Expected 400 or 403 for cross-tenant DELETE, got {status}"
+    assert response.status_code in (400, 403), (
+        f"Expected 400 or 403 for cross-tenant DELETE, got {response.status_code}: {response.text}"
+    )
+    data = response.json()
+    assert data.get("message_code") in ("unable_to_authorize_request", "forbidden"), (
+        f"Expected authorization denial, got message_code: {data.get('message_code')}"
+    )

Do the equivalent body assertion in validate_evalhub_delete_no_tenant().

As per coding guidelines: `**/*.py`: Focus on security, test structure and coding style adherence in new code introduced`.

Also applies to: 455-479

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_explainability/evalhub/utils.py` around lines 438 - 444, The
helpers accept any 400/403 because delete_evalhub_job() currently returns only
status_code; modify delete_evalhub_job() to return the full HTTP response (or
both status_code and body) and then update validate_evalhub_delete_denied() and
validate_evalhub_delete_no_tenant() to call that function, assert the expected
auth status (e.g. 403 or 400 as appropriate) AND inspect the response body for
an auth-specific error (e.g. response.json().get('error') or response.text
contains authorization/tenant-related messages) so the tests fail when a
non-auth-related 400 is returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants