test: validate Tenant reconcile health in e2e tests#925
Conversation
|
@coderabbitai review |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: EgorLu The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a new end-to-end pytest module that checks Tenant reconciliation health and validates resources the Tenant should create. Introduces resilient Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Security and code quality findings
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Actions performedReview triggered.
|
|
CI failures are unrelated to this PR:
This PR only adds a Python test file and a one-line shell script change. No Go code was modified. |
|
/retest |
1f12a43 to
5e07b25
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/tests/test_tenant_health.py`:
- Around line 45-47: The subprocess.run call in _run_oc should include a hard
timeout to prevent a stuck oc process from hanging tests; update the call inside
_run_oc to pass timeout=TENANT_HEALTH_TIMEOUT (or a suitably-named timeout
constant) and add handling for subprocess.TimeoutExpired—catch the exception in
_run_oc, log or return a failed result indicating a timeout, and ensure the
child process is treated as failed so TENANT_HEALTH_TIMEOUT is honored.
- Around line 139-142: The _crd_exists function currently checks CRD presence
with a substring match against the full `oc api-resources` table; change it to
call _run_oc(["api-resources", "--api-group", api_group, "-o", "name"]) so the
output contains only resource names, split the stdout into lines and build a set
of names, then check for exact membership of `resource` (or resource.lower() if
normalizing) in that set; update the call site in _crd_exists and keep using
_run_oc for execution and rc handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0bae1e72-2d60-41fd-8627-b6304a9fc825
📒 Files selected for processing (2)
test/e2e/scripts/prow_run_smoke_test.shtest/e2e/tests/test_tenant_health.py
|
/retest |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/tests/test_tenant_health.py`:
- Around line 46-47: The test currently invokes subprocess.run(["oc", *args],
...) which resolves "oc" via PATH and can be hijacked; resolve and pin the
absolute path once (e.g., call shutil.which("oc") into a variable like oc_path
at module init or test setup, fail the test if not found) and replace
subprocess.run(["oc", *args], ...) with subprocess.run([oc_path, *args], ...) so
the binary is unambiguous and you explicitly error if the binary cannot be
located.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b70357dc-581f-452b-bc87-3c48a7949a68
📒 Files selected for processing (1)
test/e2e/tests/test_tenant_health.py
|
@ishitasequeira Good catch — there is overlap with #869. Here's the breakdown: Duplicated by #869 (can be removed from this PR):
Unique to this PR (not covered by #869):
I'll rebase on main (which now has #869) and strip the duplicated |
Add test_tenant_health.py with label-based resource audit that queries the cluster for all resources the Tenant reconciler applied via the maas.opendatahub.io/tenant-name tracking label. Catches RBAC gaps and manifest-apply failures (like the PodMonitor incident from PR opendatahub-io#813) at PR time instead of after merge. Complements test_tenant.py (PR opendatahub-io#869) which covers Tenant lifecycle, phase, and status shape. This PR adds: - TestTenantConditionValues: asserts specific condition values (DependenciesAvailable, MaaSPrerequisitesAvailable, DeploymentsAvailable) are True - TestTenantManagedResources: audits core, networking, RBAC, monitoring, and optional observability resources exist RHOAIENG-62458 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c3f54f1 to
f591f24
Compare
|
Shortened the tests quite a lot :) |
|
@EgorLu: The following test has Failed: OCI Artifact Browser URLInspecting Test Artifacts ManuallyTo inspect your test artifacts manually, follow these steps:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:maas-group-test-knp7v |
Summary
test/e2e/tests/test_tenant_health.pywith two test classes:Phase=Activewith healthy conditions (Ready=True,DependenciesAvailable,MaaSPrerequisitesAvailable,DeploymentsAvailable,Degraded=False). Fails fast onPhase=Failedwith the reconciler's error message (e.g., RBAC forbidden).maas.opendatahub.io/tenant-nametracking label.test_tenant_health.pyas the first test file inprow_run_smoke_test.shso reconcile failures fail the suite early with clear diagnostics.Motivation
PR #813 added a PodMonitor without a matching RBAC marker. The Tenant reconciler failed on every loop with
Forbidden, but e2e tests passed because they only check maas-api deployment readiness and API endpoints. This was caught by downstream ODH operator tests after merge and required hotfix PR #859.These tests close that gap by validating the Tenant CR's own health status and auditing resource existence.
Jira: RHOAIENG-62458
Risk analysis
Test plan
py_compile)--collect-only)prow_run_smoke_test.shwhich now includestest_tenant_health.pyfirst🤖 Generated with Claude Code
Summary by CodeRabbit