test(evalhub): add multi-tenancy integration tests#1333
test(evalhub): add multi-tenancy integration tests#1333ruivieira wants to merge 3 commits intoopendatahub-io:mainfrom
Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/hold', '/verified', '/wip', '/cherry-pick', '/lgtm', '/build-push-pr-image'} |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 6 minutes and 4 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds EvalHub multi-tenancy test infrastructure and utilities: new multi-tenant fixtures, RBAC provisioning, vLLM emulator deployment, tenant-scoped tests for jobs/collections/providers, expanded request/job helper APIs, and a small EvalHub resource change to accept a database spec. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Actionable security findings (direct, actionable, cite CWE):
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (5)
tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py (2)
210-210: Same dict access pattern issue.Apply consistent defensive access here as well.
Proposed fix
- job_id = submitted["resource"]["id"] + job_id = submitted.get("resource", {}).get("id") + assert job_id, f"Submitted job missing 'resource.id': {submitted}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py` at line 210, Replace the direct dict indexing job_id = submitted["resource"]["id"] with a defensive access pattern to avoid KeyError—e.g., extract job_id = submitted.get("resource", {}).get("id") or check/assert that "resource" in submitted and "id" in submitted["resource"] before assigning; ensure any subsequent assertions or usage of job_id handle a missing value appropriately.
86-86: Inconsistent dict access pattern.Line 61 uses defensive
.get()but line 86 uses direct key access. If the API response schema changes, this raises an unclearKeyErrorinstead of a descriptive assertion failure.Proposed fix
- job_id = data["resource"]["id"] + job_id = data.get("resource", {}).get("id") + assert job_id, f"Job response missing 'resource.id': {data}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py` at line 86, The test inconsistently accesses the API response: instead of directly using data["resource"]["id"] (variable job_id), change it to a defensive access or explicit assertion—e.g., assert that "resource" and "id" exist on data (with a clear message) and then use data.get("resource", {}).get("id") to set job_id; update use sites around job_id and the test function name (test_evalhub_jobs_mt) to rely on this defensive pattern so a schema change yields a clear assertion error instead of a KeyError.tests/model_explainability/evalhub/multitenancy/conftest.py (1)
154-163: TimeoutSampler loop does not raise on timeout path.If
_tenant_rbac_readynever returnsTrue,TimeoutSamplerraisesTimeoutExpiredError. The function signature returnsNone, which is correct for the happy path, but the docstring should clarify the exception behavior for callers.Proposed docstring addition
def tenant_a_rbac_ready( admin_client: DynamicClient, tenant_a_namespace: Namespace, evalhub_mt_deployment: Deployment, ) -> None: """Wait for the operator to provision job RBAC in tenant-a. The operator watches for namespaces with the tenant label and creates jobs-writer + job-config RoleBindings. This fixture blocks until those RoleBindings exist. + + Raises: + TimeoutExpiredError: If RBAC is not provisioned within 120s. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/multitenancy/conftest.py` around lines 154 - 163, The docstring for the function using TimeoutSampler should be updated to document the timeout/exception behavior: mention that TimeoutSampler will raise TimeoutExpiredError if _tenant_rbac_ready never returns True within wait_timeout (120s in this usage), and that the function returns None on the happy path but will propagate/raise TimeoutExpiredError on timeout; update the docstring where TimeoutSampler is used (referencing TimeoutSampler and _tenant_rbac_ready) to explicitly state this exception and its conditions so callers know to handle it.tests/model_explainability/evalhub/utils.py (2)
352-355: Unreachable code afterTimeoutSamplerloop.
TimeoutSamplerraisesTimeoutExpiredErrorwhen the timeout is exceeded without finding a match. Thereturn {}on line 355 is unreachable. While harmless, it's misleading about the function's behavior.Proposed fix
for sample in TimeoutSampler( wait_timeout=timeout, sleep=sleep, func=_get_job_status, host=host, token=token, ca_bundle_file=ca_bundle_file, tenant=tenant, job_id=job_id, ): state = sample.get("status", {}).get("state", "") LOGGER.info(f"Job {job_id} state: {state}") if state in EVALHUB_JOB_TERMINAL_STATES: return sample - - return {} + # TimeoutSampler raises TimeoutExpiredError if we never return above + raise AssertionError("TimeoutSampler exited without timeout or match")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/utils.py` around lines 352 - 355, The trailing "return {}" after the TimeoutSampler loop is unreachable because TimeoutSampler raises TimeoutExpiredError on timeout; remove the misleading return and instead either let the TimeoutExpiredError propagate or explicitly catch TimeoutExpiredError where this function is called; update the code around the TimeoutSampler usage (referencing TimeoutSampler, TimeoutExpiredError and EVALHUB_JOB_TERMINAL_STATES) to eliminate the dead "return {}" and ensure callers handle the exception if needed.
174-174: Missing generic type parameters ondict.Per
pyproject.tomlmypy settings (disallow_any_generics),dictrequires explicit type parameters.Proposed fix
+from typing import Any + def submit_evalhub_job( host: str, token: str, ca_bundle_file: str, tenant: str, - payload: dict, + payload: dict[str, Any], ) -> dict[str, Any]:Apply the same pattern to
validate_evalhub_post_denied(line 214),validate_evalhub_post_no_tenant(line 249),build_evalhub_job_payload(line 491), andsubmit_evalhub_collection(line 527).As per coding guidelines, mypy options require
disallow_any_generics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/utils.py` at line 174, Change unparameterized use of dict to explicit typing to satisfy disallow_any_generics: update the function signatures for validate_evalhub_post, validate_evalhub_post_denied, validate_evalhub_post_no_tenant, build_evalhub_job_payload, and submit_evalhub_collection to use typed dicts such as Dict[str, Any] (or a more specific Mapping/TypedDict if appropriate) for parameters named payload (and any other dict-typed params/returns), and add the necessary imports from typing (e.g., Dict, Any) at the top of tests/model_explainability/evalhub/utils.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py`:
- Around line 12-22: The variable COLLECTION_PAYLOAD is annotated as bare dict
which violates disallow_any_generics; update its type annotation to include
parameters (for example use Dict[str, Any] or a more specific type like
Dict[str, object]) and add the necessary import (e.g., from typing import Any,
Dict) so the declaration becomes COLLECTION_PAYLOAD: Dict[str, Any] = { ... }
(or replace with a precise typed structure if preferred) to satisfy the linter;
reference COLLECTION_PAYLOAD to locate the change and ensure imports are
added/adjusted accordingly.
In
`@tests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.py`:
- Around line 76-99: The test creates a job via submit_evalhub_job (using
build_evalhub_job_payload) and then calls validate_evalhub_delete_denied, but
never cleans up the created job; wrap the submission and denial assertion in a
try/finally (or ensure a teardown step) and in the finally call the
service-level deletion helper (e.g., delete_evalhub_job) using tenant-a
credentials (token=tenant_a_token, tenant=tenant_a_namespace.name,
host=evalhub_mt_route.host, ca_bundle_file=evalhub_mt_ca_bundle_file,
job_id=job_id) and ignore non-fatal errors so the job is always removed after
the negative-path check.
- Around line 46-47: Replace the fixed job_name="evalhub-delete-test-job" with a
unique name generator in the test (e.g., append a short UUID or timestamp) so
each test run creates distinct jobs; update the job_name parameter used in the
test functions (the job_name argument passed when creating the job) and apply
the same change to the other occurrences noted so tests at lines ~80-81 and
~113-114 also use unique names to avoid collisions in reruns/parallel execution
and to prevent masking auth failures.
In `@tests/model_explainability/evalhub/test_evalhub_deployment.py`:
- Around line 28-33: The test function test_evalhub_single_pod_single_container
has an untyped fixture parameter admin_client; add a type annotation for
admin_client (e.g., admin_client: Any or the specific fixture type) and import
the type (from typing import Any) or the concrete AdminClient type so the
signature def test_evalhub_single_pod_single_container(self, admin_client: Any,
model_namespace: Namespace, evalhub_deployment: Deployment) -> None satisfies
disallow_untyped_defs.
In `@tests/model_explainability/evalhub/utils.py`:
- Around line 203-204: The log is pulling the wrong key from the HTTP response:
change the LOGGER.info call that currently uses data.get('id') to reference the
nested id at data["resource"]["id"] (locate the response -> data assignment and
the LOGGER.info line that logs "Job submitted: ..."); ensure you safely access
the nested key (handle missing resource or id) and log a sensible fallback like
"unknown" if the nested id is absent.
---
Nitpick comments:
In `@tests/model_explainability/evalhub/multitenancy/conftest.py`:
- Around line 154-163: The docstring for the function using TimeoutSampler
should be updated to document the timeout/exception behavior: mention that
TimeoutSampler will raise TimeoutExpiredError if _tenant_rbac_ready never
returns True within wait_timeout (120s in this usage), and that the function
returns None on the happy path but will propagate/raise TimeoutExpiredError on
timeout; update the docstring where TimeoutSampler is used (referencing
TimeoutSampler and _tenant_rbac_ready) to explicitly state this exception and
its conditions so callers know to handle it.
In `@tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py`:
- Line 210: Replace the direct dict indexing job_id =
submitted["resource"]["id"] with a defensive access pattern to avoid
KeyError—e.g., extract job_id = submitted.get("resource", {}).get("id") or
check/assert that "resource" in submitted and "id" in submitted["resource"]
before assigning; ensure any subsequent assertions or usage of job_id handle a
missing value appropriately.
- Line 86: The test inconsistently accesses the API response: instead of
directly using data["resource"]["id"] (variable job_id), change it to a
defensive access or explicit assertion—e.g., assert that "resource" and "id"
exist on data (with a clear message) and then use data.get("resource",
{}).get("id") to set job_id; update use sites around job_id and the test
function name (test_evalhub_jobs_mt) to rely on this defensive pattern so a
schema change yields a clear assertion error instead of a KeyError.
In `@tests/model_explainability/evalhub/utils.py`:
- Around line 352-355: The trailing "return {}" after the TimeoutSampler loop is
unreachable because TimeoutSampler raises TimeoutExpiredError on timeout; remove
the misleading return and instead either let the TimeoutExpiredError propagate
or explicitly catch TimeoutExpiredError where this function is called; update
the code around the TimeoutSampler usage (referencing TimeoutSampler,
TimeoutExpiredError and EVALHUB_JOB_TERMINAL_STATES) to eliminate the dead
"return {}" and ensure callers handle the exception if needed.
- Line 174: Change unparameterized use of dict to explicit typing to satisfy
disallow_any_generics: update the function signatures for validate_evalhub_post,
validate_evalhub_post_denied, validate_evalhub_post_no_tenant,
build_evalhub_job_payload, and submit_evalhub_collection to use typed dicts such
as Dict[str, Any] (or a more specific Mapping/TypedDict if appropriate) for
parameters named payload (and any other dict-typed params/returns), and add the
necessary imports from typing (e.g., Dict, Any) at the top of
tests/model_explainability/evalhub/utils.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: bb9f5549-6910-4d2c-a8b4-cbc29f319a66
📒 Files selected for processing (14)
tests/model_explainability/evalhub/conftest.pytests/model_explainability/evalhub/constants.pytests/model_explainability/evalhub/multitenancy/__init__.pytests/model_explainability/evalhub/multitenancy/conftest.pytests/model_explainability/evalhub/multitenancy/test_evalhub_collections_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_providers_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_tenant_rbac.pytests/model_explainability/evalhub/test_evalhub_deployment.pytests/model_explainability/evalhub/test_evalhub_health.pytests/model_explainability/evalhub/utils.pyutilities/resources/evalhub.py
| COLLECTION_PAYLOAD: dict = { | ||
| "name": "MT Test Collection", | ||
| "category": "test", | ||
| "description": "Collection created by multi-tenancy integration test", | ||
| "benchmarks": [ | ||
| { | ||
| "id": "arc_easy", | ||
| "provider_id": "lm_evaluation_harness", | ||
| } | ||
| ], | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify bare dict annotation in this file.
rg -nP '^\s*COLLECTION_PAYLOAD\s*:\s*dict\s*=' tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.pyRepository: opendatahub-io/opendatahub-tests
Length of output: 107
🏁 Script executed:
cat -n pyproject.toml | head -100Repository: opendatahub-io/opendatahub-tests
Length of output: 2579
🏁 Script executed:
rg -n 'disallow_any_generics' pyproject.tomlRepository: opendatahub-io/opendatahub-tests
Length of output: 107
🏁 Script executed:
cat -n tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.pyRepository: opendatahub-io/opendatahub-tests
Length of output: 3963
Add type parameters to COLLECTION_PAYLOAD to satisfy disallow_any_generics.
Line 12 uses bare dict without type parameters, violating the disallow_any_generics setting enabled in pyproject.toml.
Fix
+from typing import TypedDict
+
+class BenchmarkPayload(TypedDict):
+ id: str
+ provider_id: str
+
+class CollectionPayload(TypedDict):
+ name: str
+ category: str
+ description: str
+ benchmarks: list[BenchmarkPayload]
+
-COLLECTION_PAYLOAD: dict = {
+COLLECTION_PAYLOAD: CollectionPayload = {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py`
around lines 12 - 22, The variable COLLECTION_PAYLOAD is annotated as bare dict
which violates disallow_any_generics; update its type annotation to include
parameters (for example use Dict[str, Any] or a more specific type like
Dict[str, object]) and add the necessary import (e.g., from typing import Any,
Dict) so the declaration becomes COLLECTION_PAYLOAD: Dict[str, Any] = { ... }
(or replace with a precise typed structure if preferred) to satisfy the linter;
reference COLLECTION_PAYLOAD to locate the change and ensure imports are
added/adjusted accordingly.
There was a problem hiding this comment.
@coderabbitai this won't trigger a CI failure since mypy explicitly excludes all test files (exclude: ^(docs/|.test..py$|...)), so disallow_any_generics doesn't apply here. Pre-commit mypy passes cleanly
There was a problem hiding this comment.
@ruivieira, thanks for the clarification! You're right — since mypy excludes all test files via exclude: ^(docs/|.*test.*\.py$|...), the disallow_any_generics constraint doesn't apply to test files, and the bare dict annotation on COLLECTION_PAYLOAD is perfectly fine. I'll disregard this finding.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
| job_name="evalhub-delete-test-job", | ||
| ) |
There was a problem hiding this comment.
Stop reusing a fixed job name across tests.
Using job_name="evalhub-delete-test-job" in multiple tests creates collision risk on reruns/parallel execution and can mask auth checks behind name-conflict failures.
Proposed fix
+from uuid import uuid4
@@
- job_name="evalhub-delete-test-job",
+ job_name=f"evalhub-delete-test-job-{uuid4().hex[:8]}",
@@
- job_name="evalhub-delete-test-job",
+ job_name=f"evalhub-delete-test-job-{uuid4().hex[:8]}",
@@
- job_name="evalhub-delete-test-job",
+ job_name=f"evalhub-delete-test-job-{uuid4().hex[:8]}",As per coding guidelines: REVIEW PRIORITIES: ... Bug-prone patterns and error handling gaps.
Also applies to: 80-81, 113-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@tests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.py`
around lines 46 - 47, Replace the fixed job_name="evalhub-delete-test-job" with
a unique name generator in the test (e.g., append a short UUID or timestamp) so
each test run creates distinct jobs; update the job_name parameter used in the
test functions (the job_name argument passed when creating the job) and apply
the same change to the other occurrences noted so tests at lines ~80-81 and
~113-114 also use unique names to avoid collisions in reruns/parallel execution
and to prevent masking auth failures.
There was a problem hiding this comment.
@coderabbitai Each test submits a new job and gets back a unique server-generated job_id (UUID) from the response. The job_name is just a human-readable label in the EvalHub API, not a unique key. The actual identifier used for deletion is data["resource"]["id"] which is a UUID like 96e73ffe-49ba-4b39-9053-3dce1de6f0c5, guaranteed unique per submission.
There was a problem hiding this comment.
@ruivieira, thanks for the clarification — that makes sense. Since the EvalHub API uses a server-generated UUID (data["resource"]["id"]) as the actual job identifier and job_name is purely a human-readable label with no uniqueness enforcement, there's no collision risk here. My concern was unfounded.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
tests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.py
Show resolved
Hide resolved
Add 24 integration tests for EvalHub covering deployment topology, health, multi-tenancy authorisation (providers, collections, jobs), job lifecycle with vLLM emulator, and operator RBAC provisioning. Signed-off-by: Rui Vieira <ruidevieira@googlemail.com>
Signed-off-by: Rui Vieira <ruidevieira@googlemail.com>
Signed-off-by: Rui Vieira <ruidevieira@googlemail.com>
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
tests/model_explainability/evalhub/utils.py (1)
203-204:⚠️ Potential issue | 🟡 MinorUse the nested job ID in the submission log.
Callers read
data["resource"]["id"], so this line logs"unknown"on successful submissions. That removes the only stable handle from later wait/delete failures.Suggested fix
- LOGGER.info(f"Job submitted: {data.get('id', 'unknown')}") + LOGGER.info(f"Job submitted: {data.get('resource', {}).get('id', 'unknown')}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/utils.py` around lines 203 - 204, The submission log uses data.get('id') but callers expect the nested job id at data["resource"]["id"], so update the LOGGER.info call that follows data = response.json() to extract the nested id (e.g., data.get('resource', {}).get('id', 'unknown')) and log that value instead; locate the LOGGER.info invocation in tests/model_explainability/evalhub/utils.py and replace the referenced key accordingly so the stable job handle used by wait/delete matches the logged id.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/model_explainability/evalhub/multitenancy/conftest.py`:
- Around line 131-163: Extend the readiness check so the tenant_a_rbac_ready
fixture waits for the operator-created ServiceAccount and service CA ConfigMap
as well as the two RoleBindings: update _tenant_rbac_ready to also list/check
for a ServiceAccount whose name startswith EVALHUB_MT_CR_NAME and contains "job"
(or the operator's job-SA naming) and a ConfigMap whose name startswith
EVALHUB_MT_CR_NAME and contains "service-ca"; return true only when
has_job_config, has_job_writer, the ServiceAccount exists, and the ConfigMap
exists so tenant_a_rbac_ready only unblocks once all RBAC-related resources are
present.
- Around line 100-123: The fixtures tenant_a_namespace and tenant_b_namespace
currently hardcode namespace names causing collisions; update both fixtures to
generate unique names per test class (e.g., incorporate the pytest request
identifier or request.cls.__name__ plus a short random suffix/uuid) before
calling create_ns so each class gets a distinct namespace; ensure you still pass
the generated name into create_ns(name=...) and keep the
EVALHUB_TENANT_LABEL_KEY label usage unchanged.
In `@tests/model_explainability/evalhub/utils.py`:
- Around line 232-240: The helper currently only checks status codes which lets
malformed payloads mask broken auth; update both
validate_evalhub_post_cross_tenant() and validate_evalhub_post_no_tenant() to
parse response.json() and assert the body contains the expected auth-specific
error: for cross-tenant POST verify the JSON error indicates an authorization
failure (e.g., contains "unauthorized", "forbidden", or an auth error
code/field), and for the no-tenant path verify the JSON error specifically
complains about a missing/invalid tenant header (e.g., contains "tenant",
"missing tenant header", or a tenant-header error code/field); keep the
status-code assertion but add these body-level assertions to make the checks
deterministic.
- Around line 438-444: The helpers accept any 400/403 because
delete_evalhub_job() currently returns only status_code; modify
delete_evalhub_job() to return the full HTTP response (or both status_code and
body) and then update validate_evalhub_delete_denied() and
validate_evalhub_delete_no_tenant() to call that function, assert the expected
auth status (e.g. 403 or 400 as appropriate) AND inspect the response body for
an auth-specific error (e.g. response.json().get('error') or response.text
contains authorization/tenant-related messages) so the tests fail when a
non-auth-related 400 is returned.
---
Duplicate comments:
In `@tests/model_explainability/evalhub/utils.py`:
- Around line 203-204: The submission log uses data.get('id') but callers expect
the nested job id at data["resource"]["id"], so update the LOGGER.info call that
follows data = response.json() to extract the nested id (e.g.,
data.get('resource', {}).get('id', 'unknown')) and log that value instead;
locate the LOGGER.info invocation in tests/model_explainability/evalhub/utils.py
and replace the referenced key accordingly so the stable job handle used by
wait/delete matches the logged id.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 7c5b3639-f3af-4c6c-9b89-729a09a3f252
📒 Files selected for processing (14)
tests/model_explainability/evalhub/conftest.pytests/model_explainability/evalhub/constants.pytests/model_explainability/evalhub/multitenancy/__init__.pytests/model_explainability/evalhub/multitenancy/conftest.pytests/model_explainability/evalhub/multitenancy/test_evalhub_collections_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_job_delete_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_providers_mt.pytests/model_explainability/evalhub/multitenancy/test_evalhub_tenant_rbac.pytests/model_explainability/evalhub/test_evalhub_deployment.pytests/model_explainability/evalhub/test_evalhub_health.pytests/model_explainability/evalhub/utils.pyutilities/resources/evalhub.py
✅ Files skipped from review due to trivial changes (3)
- tests/model_explainability/evalhub/conftest.py
- tests/model_explainability/evalhub/test_evalhub_deployment.py
- tests/model_explainability/evalhub/constants.py
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/model_explainability/evalhub/multitenancy/test_evalhub_collections_mt.py
- tests/model_explainability/evalhub/multitenancy/test_evalhub_create_collection_mt.py
- utilities/resources/evalhub.py
- tests/model_explainability/evalhub/multitenancy/test_evalhub_providers_mt.py
- tests/model_explainability/evalhub/multitenancy/test_evalhub_jobs_mt.py
| @pytest.fixture(scope="class") | ||
| def tenant_a_namespace( | ||
| admin_client: DynamicClient, | ||
| ) -> Generator[Namespace, Any, Any]: | ||
| """Tenant namespace where the test user HAS access.""" | ||
| with create_ns( | ||
| admin_client=admin_client, | ||
| name="test-evalhub-tenant-a", | ||
| labels={EVALHUB_TENANT_LABEL_KEY: "true"}, | ||
| ) as ns: | ||
| yield ns | ||
|
|
||
|
|
||
| @pytest.fixture(scope="class") | ||
| def tenant_b_namespace( | ||
| admin_client: DynamicClient, | ||
| ) -> Generator[Namespace, Any, Any]: | ||
| """Tenant namespace where the test user does NOT have access.""" | ||
| with create_ns( | ||
| admin_client=admin_client, | ||
| name="test-evalhub-tenant-b", | ||
| labels={EVALHUB_TENANT_LABEL_KEY: "true"}, | ||
| ) as ns: | ||
| yield ns |
There was a problem hiding this comment.
Generate tenant namespaces per test class.
These fixtures are class-scoped but hardcode test-evalhub-tenant-a and test-evalhub-tenant-b. This conftest.py is shared by multiple new multitenancy modules, so parallel classes—or a class starting before the previous namespace fully leaves Terminating—will contend on the same namespaces and all fixed resource names inside them.
Suggested fix
`@pytest.fixture`(scope="class")
def tenant_a_namespace(
admin_client: DynamicClient,
+ model_namespace: Namespace,
) -> Generator[Namespace, Any, Any]:
@@
- name="test-evalhub-tenant-a",
+ name=f"{model_namespace.name}-tenant-a",
labels={EVALHUB_TENANT_LABEL_KEY: "true"},
) as ns:
yield ns
@@
`@pytest.fixture`(scope="class")
def tenant_b_namespace(
admin_client: DynamicClient,
+ model_namespace: Namespace,
) -> Generator[Namespace, Any, Any]:
@@
- name="test-evalhub-tenant-b",
+ name=f"{model_namespace.name}-tenant-b",
labels={EVALHUB_TENANT_LABEL_KEY: "true"},
) as ns:
yield ns🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/model_explainability/evalhub/multitenancy/conftest.py` around lines 100
- 123, The fixtures tenant_a_namespace and tenant_b_namespace currently hardcode
namespace names causing collisions; update both fixtures to generate unique
names per test class (e.g., incorporate the pytest request identifier or
request.cls.__name__ plus a short random suffix/uuid) before calling create_ns
so each class gets a distinct namespace; ensure you still pass the generated
name into create_ns(name=...) and keep the EVALHUB_TENANT_LABEL_KEY label usage
unchanged.
| def _tenant_rbac_ready(admin_client: DynamicClient, namespace: str) -> bool: | ||
| """Check if the operator has provisioned job RBAC for the test EvalHub instance.""" | ||
| rbs = list(RoleBinding.get(client=admin_client, namespace=namespace)) | ||
| rb_names = [rb.name for rb in rbs] | ||
| # Look for RoleBindings prefixed with the test instance name to avoid | ||
| # matching RoleBindings from the production EvalHub instance. | ||
| has_job_config = any(name.startswith(EVALHUB_MT_CR_NAME) and "job-config" in name for name in rb_names) | ||
| has_job_writer = any(name.startswith(EVALHUB_MT_CR_NAME) and "job-writer" in name for name in rb_names) | ||
| return has_job_config and has_job_writer | ||
|
|
||
|
|
||
| @pytest.fixture(scope="class") | ||
| def tenant_a_rbac_ready( | ||
| admin_client: DynamicClient, | ||
| tenant_a_namespace: Namespace, | ||
| evalhub_mt_deployment: Deployment, | ||
| ) -> None: | ||
| """Wait for the operator to provision job RBAC in tenant-a. | ||
|
|
||
| The operator watches for namespaces with the tenant label and | ||
| creates jobs-writer + job-config RoleBindings. This fixture | ||
| blocks until those RoleBindings exist. | ||
| """ | ||
| for ready in TimeoutSampler( | ||
| wait_timeout=120, | ||
| sleep=5, | ||
| func=_tenant_rbac_ready, | ||
| admin_client=admin_client, | ||
| namespace=tenant_a_namespace.name, | ||
| ): | ||
| if ready: | ||
| LOGGER.info(f"Operator RBAC provisioned in {tenant_a_namespace.name}") | ||
| return |
There was a problem hiding this comment.
Wait for every resource the RBAC tests assert.
tenant_a_rbac_ready returns once the two RoleBindings exist, but tests/model_explainability/evalhub/multitenancy/test_evalhub_tenant_rbac.py uses it before checking the operator-created job ServiceAccount and service CA ConfigMap. If those arrive later in the reconcile loop, the suite becomes timing-dependent.
Suggested fix
+from ocp_resources.config_map import ConfigMap def _tenant_rbac_ready(admin_client: DynamicClient, namespace: str) -> bool:
"""Check if the operator has provisioned job RBAC for the test EvalHub instance."""
rbs = list(RoleBinding.get(client=admin_client, namespace=namespace))
rb_names = [rb.name for rb in rbs]
+ sa_names = [sa.name for sa in ServiceAccount.get(client=admin_client, namespace=namespace)]
# Look for RoleBindings prefixed with the test instance name to avoid
# matching RoleBindings from the production EvalHub instance.
has_job_config = any(name.startswith(EVALHUB_MT_CR_NAME) and "job-config" in name for name in rb_names)
has_job_writer = any(name.startswith(EVALHUB_MT_CR_NAME) and "job-writer" in name for name in rb_names)
- return has_job_config and has_job_writer
+ has_job_sa = any(
+ name.startswith(f"{EVALHUB_MT_CR_NAME}-") and name.endswith("-job")
+ for name in sa_names
+ )
+ has_service_ca = ConfigMap(
+ client=admin_client,
+ name=f"{EVALHUB_MT_CR_NAME}-service-ca",
+ namespace=namespace,
+ ).exists
+ return has_job_config and has_job_writer and has_job_sa and has_service_ca🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/model_explainability/evalhub/multitenancy/conftest.py` around lines 131
- 163, Extend the readiness check so the tenant_a_rbac_ready fixture waits for
the operator-created ServiceAccount and service CA ConfigMap as well as the two
RoleBindings: update _tenant_rbac_ready to also list/check for a ServiceAccount
whose name startswith EVALHUB_MT_CR_NAME and contains "job" (or the operator's
job-SA naming) and a ConfigMap whose name startswith EVALHUB_MT_CR_NAME and
contains "service-ca"; return true only when has_job_config, has_job_writer, the
ServiceAccount exists, and the ConfigMap exists so tenant_a_rbac_ready only
unblocks once all RBAC-related resources are present.
| response = requests.post( | ||
| url=url, | ||
| headers=build_headers(token=token, tenant=tenant), | ||
| json=payload, | ||
| verify=ca_bundle_file, | ||
| timeout=30, | ||
| ) | ||
| assert response.status_code in (400, 403), ( | ||
| f"Expected 400 or 403 for cross-tenant POST, got {response.status_code}: {response.text}" |
There was a problem hiding this comment.
Make POST negative-path helpers verify the auth contract.
A malformed collection/job payload can also return 400, so these helpers can pass while tenant authorization is broken. Parse the body and assert the auth-specific error for cross-tenant requests, and the tenant-header-specific error for the no-tenant path. This weakens the suite's security coverage (CWE-693).
Suggested fix
assert response.status_code in (400, 403), (
f"Expected 400 or 403 for cross-tenant POST, got {response.status_code}: {response.text}"
)
+ data = response.json()
+ assert data.get("message_code") in ("unable_to_authorize_request", "forbidden"), (
+ f"Expected authorization denial, got message_code: {data.get('message_code')}"
+ )Mirror the same pattern in validate_evalhub_post_no_tenant() with a tenant-header-specific assertion instead of accepting any 400.
Also applies to: 266-273
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/model_explainability/evalhub/utils.py` around lines 232 - 240, The
helper currently only checks status codes which lets malformed payloads mask
broken auth; update both validate_evalhub_post_cross_tenant() and
validate_evalhub_post_no_tenant() to parse response.json() and assert the body
contains the expected auth-specific error: for cross-tenant POST verify the JSON
error indicates an authorization failure (e.g., contains "unauthorized",
"forbidden", or an auth error code/field), and for the no-tenant path verify the
JSON error specifically complains about a missing/invalid tenant header (e.g.,
contains "tenant", "missing tenant header", or a tenant-header error
code/field); keep the status-code assertion but add these body-level assertions
to make the checks deterministic.
| response = requests.delete( | ||
| url=url, | ||
| headers=build_headers(token=token, tenant=tenant), | ||
| verify=ca_bundle_file, | ||
| timeout=10, | ||
| ) | ||
| return response.status_code |
There was a problem hiding this comment.
Make DELETE negative-path helpers verify the auth contract.
validate_evalhub_delete_denied() and validate_evalhub_delete_no_tenant() accept any 400/403 today. A fast job can hit a non-cancellable lifecycle state and return a generic 400, which lets the auth tests pass for the wrong reason. Keep the response body from delete_evalhub_job() and assert an auth-specific error in both helpers. This weakens the suite's security coverage (CWE-693).
Suggested fix
def delete_evalhub_job(
host: str,
token: str,
ca_bundle_file: str,
tenant: str,
job_id: str,
-) -> int:
- """Delete (cancel) an evaluation job. Returns the HTTP status code."""
+) -> requests.Response:
+ """Delete (cancel) an evaluation job."""
@@
- return response.status_code
+ return response- status = delete_evalhub_job(
+ response = delete_evalhub_job(
host=host,
token=token,
ca_bundle_file=ca_bundle_file,
tenant=tenant,
job_id=job_id,
)
- assert status in (400, 403), f"Expected 400 or 403 for cross-tenant DELETE, got {status}"
+ assert response.status_code in (400, 403), (
+ f"Expected 400 or 403 for cross-tenant DELETE, got {response.status_code}: {response.text}"
+ )
+ data = response.json()
+ assert data.get("message_code") in ("unable_to_authorize_request", "forbidden"), (
+ f"Expected authorization denial, got message_code: {data.get('message_code')}"
+ )Do the equivalent body assertion in validate_evalhub_delete_no_tenant().
Also applies to: 455-479
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/model_explainability/evalhub/utils.py` around lines 438 - 444, The
helpers accept any 400/403 because delete_evalhub_job() currently returns only
status_code; modify delete_evalhub_job() to return the full HTTP response (or
both status_code and body) and then update validate_evalhub_delete_denied() and
validate_evalhub_delete_no_tenant() to call that function, assert the expected
auth status (e.g. 403 or 400 as appropriate) AND inspect the response body for
an auth-specific error (e.g. response.json().get('error') or response.text
contains authorization/tenant-related messages) so the tests fail when a
non-auth-related 400 is returned.
Pull Request
Summary
Adds comprehensive EvalHub integration tests covering deployment topology, multi-tenancy authorization, job lifecycle, and operator RBAC provisioning.
These tests validate the full multi-tenancy contract: authorised access, cross-tenant denial, and missing tenant rejection across all API endpoints. They also verify the operator correctly provisions RBAC in tenant namespaces and that evaluation jobs complete end-to-end using the vLLM emulator.
(Note: EvalHub CRD has changed and needs to be regenerated for this PR)
Summary of tests added
Deployment topology (smoke)
test_evalhub_single_pod_single_containerevalhub, correct labels (app,instance,component)Health (smoke)
test_evalhub_health_endpointhealthystatus (existing)test_evalhub_health_is_tenant_agnosticX-Tenant; ignores it when presentMulti-tenancy — providers (3 scenarios)
test_providers_authorized_tenanttest_providers_cross_tenant_forbiddentest_providers_missing_tenant_rejectedMulti-tenancy — collections listing (3 scenarios)
test_collections_authorized_tenantitemslisttest_collections_cross_tenant_deniedtest_collections_missing_tenant_rejectedMulti-tenancy — collection creation (3 scenarios)
test_create_collection_authorized_tenanttest_create_collection_cross_tenant_deniedtest_create_collection_missing_tenant_rejectedMulti-tenancy — job submission and lifecycle (7 tests)
test_job_submit_authorized_tenanttest_job_completion_and_resultstest_job_pod_reaches_succeededSucceededtest_job_submit_cross_tenant_deniedtest_job_submit_missing_tenant_rejectedtest_job_list_authorized_tenanttest_job_list_cross_tenant_deniedtest_job_list_missing_tenant_rejectedMulti-tenancy — job deletion (3 scenarios)
test_job_delete_authorized_tenanttest_job_delete_cross_tenant_deniedtest_job_delete_missing_tenant_rejectedOperator RBAC provisioning (4 tests)
test_job_service_account_createdtest_jobs_writer_role_bindingtest_job_config_role_bindingtest_service_ca_configmap_createdinject-cabundleannotationPlease review and indicate how it has been tested
Additional Requirements
Summary by CodeRabbit
Tests
Chores