feat: Added multi-tenancy test cases for evalhub#1331
feat: Added multi-tenancy test cases for evalhub#1331kpunwatk wants to merge 2 commits intoopendatahub-io:mainfrom
Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/cherry-pick', '/wip', '/build-push-pr-image', '/hold', '/lgtm', '/verified'} |
modified: tests/model_explainability/evalhub/conftest.py modified: tests/model_explainability/evalhub/constants.py new file: tests/model_explainability/evalhub/test_evalhub_multi_tenancy.py modified: tests/model_explainability/evalhub/utils.py Signed-off-by: Karishma Punwatkar <kpunwatk@redhat.com>
bb20f21 to
aa961b2
Compare
|
/wip |
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughMulti-tenancy test infrastructure for EvalHub with namespace/service account/RBAC fixtures, token generation via Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Security concerns requiring attention:
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
tests/model_explainability/evalhub/utils.py (1)
95-123: Code duplication across API helpers.All new functions (
list_evalhub_providers,get_evalhub_provider,list_evalhub_benchmarks,list_evalhub_collections,list_evalhub_evaluation_jobs,get_evalhub_evaluation_job) share identical GET request boilerplate. Consider extracting a generic helper:def _get_json( host: str, path: str, token: str, ca_bundle_file: str, tenant: str, timeout: int = 10, ) -> dict: url = f"https://{host}{path}" response = requests.get( url=url, headers=_build_headers(token=token, tenant=tenant), verify=ca_bundle_file, timeout=timeout, ) response.raise_for_status() return response.json()Then each function becomes a thin wrapper. Same applies for POST helpers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/utils.py` around lines 95 - 123, Multiple functions (list_evalhub_providers, get_evalhub_provider, list_evalhub_benchmarks, list_evalhub_collections, list_evalhub_evaluation_jobs, get_evalhub_evaluation_job) duplicate the same GET request boilerplate; extract a reusable helper (e.g., _get_json(host, path, token, ca_bundle_file, tenant, timeout=10)) that builds the URL, calls requests.get with headers from _build_headers, verify=ca_bundle_file, timeout, calls raise_for_status and returns response.json, then refactor each of the listed functions into thin wrappers that call _get_json with the appropriate EVALHUB_*_PATH constants/paths; apply the same pattern for any POST helpers by adding a _post_json helper for POST payloads.tests/model_explainability/evalhub/test_evalhub_multi_tenancy.py (1)
45-69: Consider@pytest.mark.parametrizefor tenant iteration.Looping within a single test hides which tenant failed. Parameterization provides clearer reporting:
`@pytest.mark.parametrize`("tenant_key", ["tenant_a", "tenant_b"]) def test_providers_list_with_tenant_scoping( self, evalhub_route: Route, evalhub_ca_bundle_file: str, multi_tenant_setup: dict, tenant_key: str, ): tenant_data = multi_tenant_setup[tenant_key] providers = list_evalhub_providers(...) # assertionsApplies to
test_create_collection_per_tenantas well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/test_evalhub_multi_tenancy.py` around lines 45 - 69, Convert the explicit tenant loop in test_providers_list_with_tenant_scoping into a parametrized pytest test so failures report which tenant failed; add `@pytest.mark.parametrize`("tenant_key", ["tenant_a", "tenant_b"]) above test_providers_list_with_tenant_scoping, add a tenant_key parameter to the test signature, replace uses of tenant_name/multi_tenant_setup[...] with tenant_data = multi_tenant_setup[tenant_key] and update assertions/messages to reference tenant_key, and do the same refactor for test_create_collection_per_tenant; ensure pytest is imported if not already.tests/model_explainability/evalhub/conftest.py (1)
160-209: Duplicated RBAC rules across tenant role fixtures.
tenant_a_evaluator_roleandtenant_b_evaluator_rolehave identicalrules. Extract to a constant:EVALHUB_EVALUATOR_RULES: list[dict] = [ { "apiGroups": ["trustyai.opendatahub.io"], "resources": ["evaluations", "collections", "providers"], "verbs": ["get", "list", "create", "update", "delete"], }, { "apiGroups": ["mlflow.kubeflow.org"], "resources": ["experiments"], "verbs": ["create", "get"], }, ]Then reference
rules=EVALHUB_EVALUATOR_RULESin both fixtures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/evalhub/conftest.py` around lines 160 - 209, Extract the duplicated RBAC rule list used in tenant_a_evaluator_role and tenant_b_evaluator_role into a shared constant (e.g. EVALHUB_EVALUATOR_RULES: list[dict]) containing the two rule dicts, then update both fixtures to pass rules=EVALHUB_EVALUATOR_RULES instead of inlining the dicts; ensure the constant is defined at module scope near other test constants so both Role(...) calls reference the same symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/model_explainability/evalhub/conftest.py`:
- Around line 290-301: The fixtures tenant_a_token (and similarly tenant_b_token
and tenant_unauthorised_token) call run_command and assume success by using the
stdout at index [1]; update each fixture to unpack run_command into (rc, stdout,
stderr), check rc == 0, and on non‑zero raise a clear error (e.g., pytest.fail
or RuntimeError) including stderr so token creation failures fail fast with
diagnostics; if rc == 0 return stdout.strip() as before. Ensure you reference
run_command, tenant_a_token, tenant_b_token, and tenant_unauthorised_token when
applying the change.
- Around line 146-157: The unauthorised ServiceAccount fixture
tenant_unauthorised_service_account is currently created in model_namespace and
may inherit ambient permissions; change it to create the SA in a dedicated
isolated namespace (e.g., create or accept a fixture like unauthorised_namespace
or use Namespace(...) to create a temp namespace) instead of model_namespace,
then instantiate ServiceAccount with namespace=unauthorised_namespace.name so
the SA cannot inherit EvalHub bindings; ensure the new namespace is created and
torn down in the fixture to avoid leaking resources and update any references to
tenant_unauthorised_service_account accordingly.
In `@tests/model_explainability/evalhub/test_evalhub_multi_tenancy.py`:
- Around line 146-167: The tests create persistent collections/jobs causing
cross-test flakiness; update test_create_collection_per_tenant,
test_tenant_collections_are_isolated, and
test_evaluation_jobs_are_tenant_isolated to use unique resource names (e.g.,
append a uuid.uuid4() or a timestamp to collection/job names) and/or perform
explicit cleanup after creation (call the delete collection/job helper using the
returned "id") so each test operates on its own resources and does not rely on
global list contents; ensure assertions check the specific returned "id" or name
created by create_evalhub_collection/create_job rather than assuming an empty
list.
---
Nitpick comments:
In `@tests/model_explainability/evalhub/conftest.py`:
- Around line 160-209: Extract the duplicated RBAC rule list used in
tenant_a_evaluator_role and tenant_b_evaluator_role into a shared constant (e.g.
EVALHUB_EVALUATOR_RULES: list[dict]) containing the two rule dicts, then update
both fixtures to pass rules=EVALHUB_EVALUATOR_RULES instead of inlining the
dicts; ensure the constant is defined at module scope near other test constants
so both Role(...) calls reference the same symbol.
In `@tests/model_explainability/evalhub/test_evalhub_multi_tenancy.py`:
- Around line 45-69: Convert the explicit tenant loop in
test_providers_list_with_tenant_scoping into a parametrized pytest test so
failures report which tenant failed; add `@pytest.mark.parametrize`("tenant_key",
["tenant_a", "tenant_b"]) above test_providers_list_with_tenant_scoping, add a
tenant_key parameter to the test signature, replace uses of
tenant_name/multi_tenant_setup[...] with tenant_data =
multi_tenant_setup[tenant_key] and update assertions/messages to reference
tenant_key, and do the same refactor for test_create_collection_per_tenant;
ensure pytest is imported if not already.
In `@tests/model_explainability/evalhub/utils.py`:
- Around line 95-123: Multiple functions (list_evalhub_providers,
get_evalhub_provider, list_evalhub_benchmarks, list_evalhub_collections,
list_evalhub_evaluation_jobs, get_evalhub_evaluation_job) duplicate the same GET
request boilerplate; extract a reusable helper (e.g., _get_json(host, path,
token, ca_bundle_file, tenant, timeout=10)) that builds the URL, calls
requests.get with headers from _build_headers, verify=ca_bundle_file, timeout,
calls raise_for_status and returns response.json, then refactor each of the
listed functions into thin wrappers that call _get_json with the appropriate
EVALHUB_*_PATH constants/paths; apply the same pattern for any POST helpers by
adding a _post_json helper for POST payloads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: bbeacb50-8478-4ed2-8f9f-cd71f6a76374
📒 Files selected for processing (4)
tests/model_explainability/evalhub/conftest.pytests/model_explainability/evalhub/constants.pytests/model_explainability/evalhub/test_evalhub_multi_tenancy.pytests/model_explainability/evalhub/utils.py
| @pytest.fixture(scope="class") | ||
| def tenant_unauthorised_service_account( | ||
| admin_client: DynamicClient, | ||
| model_namespace: Namespace, | ||
| ) -> Generator[ServiceAccount, Any, Any]: | ||
| """Create service account without any EvalHub permissions.""" | ||
| with ServiceAccount( | ||
| client=admin_client, | ||
| name=TENANT_UNAUTHORISED_SA_NAME, | ||
| namespace=model_namespace.name, | ||
| ) as sa: | ||
| yield sa |
There was a problem hiding this comment.
Unauthorised SA in model_namespace may have ambient permissions.
Per the relevant code snippet (tests/conftest.py:114-134), model_namespace is the namespace where EvalHub is deployed. Creating the unauthorised SA here risks inheriting permissions from default ServiceAccounts or namespace-scoped bindings, making the 403 assertion in test_unauthorised_user_cannot_access_providers unreliable.
Create a dedicated isolated namespace for the unauthorised SA:
+@pytest.fixture(scope="class")
+def unauthorised_namespace(
+ admin_client: DynamicClient,
+) -> Generator[Namespace, Any, Any]:
+ """Create isolated namespace for unauthorised service account."""
+ with Namespace(
+ client=admin_client,
+ name="evalhub-no-access-ns",
+ ) as ns:
+ ns.wait_for_status(status=Namespace.Status.ACTIVE, timeout=Timeout.TIMEOUT_2MIN)
+ yield ns
+
+
`@pytest.fixture`(scope="class")
def tenant_unauthorised_service_account(
admin_client: DynamicClient,
- model_namespace: Namespace,
+ unauthorised_namespace: Namespace,
) -> Generator[ServiceAccount, Any, Any]:
"""Create service account without any EvalHub permissions."""
with ServiceAccount(
client=admin_client,
name=TENANT_UNAUTHORISED_SA_NAME,
- namespace=model_namespace.name,
+ namespace=unauthorised_namespace.name,
) as sa:
yield sa📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.fixture(scope="class") | |
| def tenant_unauthorised_service_account( | |
| admin_client: DynamicClient, | |
| model_namespace: Namespace, | |
| ) -> Generator[ServiceAccount, Any, Any]: | |
| """Create service account without any EvalHub permissions.""" | |
| with ServiceAccount( | |
| client=admin_client, | |
| name=TENANT_UNAUTHORISED_SA_NAME, | |
| namespace=model_namespace.name, | |
| ) as sa: | |
| yield sa | |
| `@pytest.fixture`(scope="class") | |
| def unauthorised_namespace( | |
| admin_client: DynamicClient, | |
| ) -> Generator[Namespace, Any, Any]: | |
| """Create isolated namespace for unauthorised service account.""" | |
| with Namespace( | |
| client=admin_client, | |
| name="evalhub-no-access-ns", | |
| ) as ns: | |
| ns.wait_for_status(status=Namespace.Status.ACTIVE, timeout=Timeout.TIMEOUT_2MIN) | |
| yield ns | |
| `@pytest.fixture`(scope="class") | |
| def tenant_unauthorised_service_account( | |
| admin_client: DynamicClient, | |
| unauthorised_namespace: Namespace, | |
| ) -> Generator[ServiceAccount, Any, Any]: | |
| """Create service account without any EvalHub permissions.""" | |
| with ServiceAccount( | |
| client=admin_client, | |
| name=TENANT_UNAUTHORISED_SA_NAME, | |
| namespace=unauthorised_namespace.name, | |
| ) as sa: | |
| yield sa |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/model_explainability/evalhub/conftest.py` around lines 146 - 157, The
unauthorised ServiceAccount fixture tenant_unauthorised_service_account is
currently created in model_namespace and may inherit ambient permissions; change
it to create the SA in a dedicated isolated namespace (e.g., create or accept a
fixture like unauthorised_namespace or use Namespace(...) to create a temp
namespace) instead of model_namespace, then instantiate ServiceAccount with
namespace=unauthorised_namespace.name so the SA cannot inherit EvalHub bindings;
ensure the new namespace is created and torn down in the fixture to avoid
leaking resources and update any references to
tenant_unauthorised_service_account accordingly.
| @pytest.fixture(scope="class") | ||
| def tenant_a_token( | ||
| tenant_a_service_account: ServiceAccount, | ||
| tenant_a_evaluator_role_binding: RoleBinding, | ||
| evalhub_providers_role_binding_tenant_a: RoleBinding, | ||
| ) -> str: | ||
| """Generate bearer token for tenant-a service account (30 min validity).""" | ||
| return run_command( | ||
| shlex.split( | ||
| f"oc create token -n {tenant_a_service_account.namespace} {tenant_a_service_account.name} --duration=30m" | ||
| ) | ||
| )[1].strip() |
There was a problem hiding this comment.
Missing error handling for oc create token command.
run_command returns (return_code, stdout, stderr). The fixture accesses [1] without verifying [0] == 0. If the command fails, you'll get an empty or error string as the token, causing cryptic failures later.
def tenant_a_token(
tenant_a_service_account: ServiceAccount,
tenant_a_evaluator_role_binding: RoleBinding,
evalhub_providers_role_binding_tenant_a: RoleBinding,
) -> str:
"""Generate bearer token for tenant-a service account (30 min validity)."""
- return run_command(
+ rc, stdout, stderr = run_command(
shlex.split(
f"oc create token -n {tenant_a_service_account.namespace} {tenant_a_service_account.name} --duration=30m"
)
- )[1].strip()
+ )
+ if rc != 0:
+ raise RuntimeError(f"Failed to create token: {stderr}")
+ return stdout.strip()Apply same fix to tenant_b_token and tenant_unauthorised_token.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/model_explainability/evalhub/conftest.py` around lines 290 - 301, The
fixtures tenant_a_token (and similarly tenant_b_token and
tenant_unauthorised_token) call run_command and assume success by using the
stdout at index [1]; update each fixture to unpack run_command into (rc, stdout,
stderr), check rc == 0, and on non‑zero raise a clear error (e.g., pytest.fail
or RuntimeError) including stderr so token creation failures fail fast with
diagnostics; if rc == 0 return stdout.strip() as before. Ensure you reference
run_command, tenant_a_token, tenant_b_token, and tenant_unauthorised_token when
applying the change.
| def test_create_collection_per_tenant( | ||
| self, | ||
| evalhub_route: Route, | ||
| evalhub_ca_bundle_file: str, | ||
| multi_tenant_setup: dict, | ||
| ): | ||
| """Verify each tenant can create their own collections.""" | ||
| for tenant_name in ["tenant_a", "tenant_b"]: | ||
| tenant_data = multi_tenant_setup[tenant_name] | ||
|
|
||
| collection = create_evalhub_collection( | ||
| host=evalhub_route.host, | ||
| token=tenant_data["token"], | ||
| ca_bundle_file=evalhub_ca_bundle_file, | ||
| tenant=tenant_data["namespace"].name, | ||
| name=f"{tenant_name}-test-collection", | ||
| description=f"Test collection for {tenant_name}", | ||
| ) | ||
|
|
||
| assert "id" in collection, f"Collection response missing 'id' for {tenant_name}" | ||
| assert "name" in collection, f"Collection response missing 'name' for {tenant_name}" | ||
| assert collection["name"] == f"{tenant_name}-test-collection", f"Collection name mismatch for {tenant_name}" |
There was a problem hiding this comment.
Test-created resources persist and may cause flaky isolation assertions.
test_create_collection_per_tenant and test_tenant_collections_are_isolated both create collections. If test_tenant_collections_are_isolated runs first, test_create_collection_per_tenant may find existing collections, and vice versa. Similarly, test_evaluation_jobs_are_tenant_isolated depends on jobs created within the test, but if other job tests run first, the job list may contain unexpected entries.
Options:
- Use unique collection/job names per test (e.g., include
uuid.uuid4()). - Clean up created resources in each test.
- Assert on specific IDs rather than list contents when possible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/model_explainability/evalhub/test_evalhub_multi_tenancy.py` around
lines 146 - 167, The tests create persistent collections/jobs causing cross-test
flakiness; update test_create_collection_per_tenant,
test_tenant_collections_are_isolated, and
test_evaluation_jobs_are_tenant_isolated to use unique resource names (e.g.,
append a uuid.uuid4() or a timestamp to collection/job names) and/or perform
explicit cleanup after creation (call the delete collection/job helper using the
returned "id") so each test operates on its own resources and does not rely on
global list contents; ensure assertions check the specific returned "id" or name
created by create_evalhub_collection/create_job rather than assuming an empty
list.
Pull Request
Summary
Related Issues
Please review and indicate how it has been tested
Additional Requirements
Summary by CodeRabbit