Skip to content

feat: move external cronjob to run timely cleanup to goroutine, part II#954

Open
zdtsw wants to merge 9 commits into
opendatahub-io:mainfrom
zdtsw-forking:fix_cleanup_cronjob2
Open

feat: move external cronjob to run timely cleanup to goroutine, part II#954
zdtsw wants to merge 9 commits into
opendatahub-io:mainfrom
zdtsw-forking:fix_cleanup_cronjob2

Conversation

@zdtsw

@zdtsw zdtsw commented Jun 3, 2026

Copy link
Copy Markdown
Member

Description

Completes the migration from the external maas-api-key-cleanup CronJob to the in-process cleanup loop in maas-api (follow-up to #934).

in detail :

  • Removes the dead CronJob
  • Cleanup: params, RBAC, deploy script, controller patch code
  • Adds general function PruneLegacyCleanupResources so the orphaned CronJob + maas-api-cleanup-restrict
    NetworkPolicy are deleted on Tenant reconcile
  • run CI test on changes not only maas-controller path

Co-authored-by @jland-redhat

How Has This Been Tested?

run

cd maas-controller; make test
cd maas-api; make test

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features
    • Ephemeral API key cleanup now runs inside maas-api as a background process, configurable via CLEANUP_INTERVAL_MINUTES (default: 15).
  • Chores
    • Removed the standalone key-cleanup CronJob and its restrictive NetworkPolicy; legacy cleanup resources are pruned during upgrades.
    • Updated controller deployment env, RBAC for CronJob operations, and CI to run on all pull requests.
  • Documentation
    • Refreshed API key administration and internal endpoint docs, plus troubleshooting steps.
  • Tests
    • Updated E2E and unit tests to validate in-process cleanup and remove CronJob/NetworkPolicy assertions.

@zdtsw zdtsw requested a review from a team as a code owner June 3, 2026 08:11
@zdtsw zdtsw requested review from liangwen12year and mynhardtburger and removed request for a team June 3, 2026 08:11
@openshift-ci

openshift-ci Bot commented Jun 3, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zdtsw
Once this PR has been reviewed and has the lgtm label, please assign jrhyness for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR migrates ephemeral API key cleanup from an external Kubernetes CronJob to an in-process background goroutine within the maas-api service. The maas-api now includes a configurable CLEANUP_INTERVAL_MINUTES environment variable (default 15 minutes) that controls a ticker-driven cleanup loop. The maas-controller gains legacy resource pruning to remove old CronJob and NetworkPolicy artifacts during cluster upgrades. All deployment manifests, parameter files, and tests are updated to reflect this architectural change, with RBAC permissions tightened to only allow get/delete operations on CronJobs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Security & Logic Issues

CWE-413 (Insufficient Resource Validation)maas-api/internal/config/config.go lines 174–177: CleanupIntervalMinutes validation applies default on zero but accepts any positive value without bounds checking. Enforce a sensible lower bound (e.g., ≥ 5 minutes) to prevent resource exhaustion via high-frequency cleanup ticks.

CWE-367 (Time-of-Check to Time-of-Use in Goroutine Shutdown)maas-api/cmd/main.go lines 130–131: ticker may fire immediately before context cancellation is observed, causing a cleanup invocation to race with server shutdown. Stop the ticker explicitly before canceling context or use time.NewTicker with deferred Stop() to prevent a final stray tick.

CWE-248 (Silent Failure / Ignored Error Conditions)maas-controller/pkg/platform/tenantreconcile/prune.go line 63 (isManagedForPrune): annotation parsing treats any non-"false" value as managed-eligible, including malformed or unset annotations. Validate annotation format and surface parse failures as warn-level logs or explicit errors to catch configuration drift.

Operational Risk: Upgrade Orderingmaas-controller/pkg/platform/tenantreconcile/pipeline.go lines 100–104: pruning executes after ApplyRendered in the reconciliation pipeline. If any post-render step or observability component depends on legacy CronJob/NetworkPolicy existence (e.g., for metrics collection or backup), pruning will break those paths silently. Audit all post-render steps and observability agents before deployment.

CWE-269 (Improper Access Control - RBAC Over-restriction)deployment/base/maas-controller/rbac/clusterrole.yaml lines 108–109 and maas-controller/pkg/controller/maas/tenant_controller.go line 82: batch cronjobs verbs reduced to [delete, get]. If any controller reconciliation loop, observability, or audit path requires list, watch, or patch on CronJobs, this change will silently fail those operations. Confirm no controller subcomponents or external integrations depend on those verbs.

Supply Chain: Unrelated Changes.github/workflows/maas-controller-ci.yml lines 4–5 and .github/hack/cleanup-odh.sh line 103: CI workflow pull_request.paths filter removed, causing all PRs to trigger full CI even for unrelated changes; cleanup script switched from removing maas-api-cleanup-restrict to maas-authorino-allow without documented rationale. These changes broaden CI surface and alter cleanup semantics outside the stated migration scope—verify intent and document if intentional.

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: migrating external CronJob cleanup to an in-process goroutine in maas-api, building on prior work (part II).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Contribution Quality And Spam Detection ✅ Passed PR is a legitimate codebase-specific architectural refactor. References prior work (#934), modifies 20+ interdependent files, includes comprehensive tests, uses sound concurrent patterns, and shows...
No Hardcoded Secrets ✅ Passed No hardcoded secrets found. All sensitive references are either Kubernetes resource references (e.g., maas-db-config secret name, postgres-creds), placeholder URLs for documentation, or legitim...
No Weak Cryptography ✅ Passed PR introduces no weak cryptographic primitives, custom crypto, or non-constant-time secret comparisons. Changes are limited to cleanup goroutine logic, configuration parameters, and deployment mani...
No Injection Vectors ✅ Passed No injection vectors found. SQL queries use parameterized static strings (CWE-89). Shell commands use list args without shell=True (CWE-78). No eval/exec on untrusted data (CWE-94). No unsafe deser...
No Privileged Containers ✅ Passed No privileged container configurations found. Both maas-api and maas-controller deployments enforce runAsNonRoot: true, allowPrivilegeEscalation: false, and drop all capabilities.
No Sensitive Data In Logs ✅ Passed All logging statements in the PR log non-sensitive operational data: cleanup status/intervals (integers), deleted counts, and Kubernetes resource names. No passwords, tokens, API keys, PII, or cred...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
maas-controller/pkg/platform/tenantreconcile/prune_test.go (1)

34-46: ⚡ Quick win

Add managed=false coverage for NetworkPolicy skip path.

Lines 34-46 only assert skip behavior for CronJob. Add a matching managed=false case for GVKNetworkPolicy so regressions in that branch are caught.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maas-controller/pkg/platform/tenantreconcile/prune_test.go` around lines 34 -
46, Add a second subtest mirroring the CronJob case that ensures a NetworkPolicy
with annotation opendatahub.io/managed="false" is skipped: create a resource via
newLegacyResource using GVKNetworkPolicy and an appropriate name/namespace, seed
it into the fake client (like the cronJob test), call
PruneLegacyCleanupResources with the same context/logr.Discard() and namespace,
then assert that the object still exists by fetching it from the client (use
client.ObjectKey with the NetworkPolicy name and appNs) and requiring no error;
reference GVKNetworkPolicy and newLegacyResource to locate where to add this
test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/hack/cleanup-odh.sh:
- Line 103: The script currently only deletes the NetworkPolicy named
maas-authorino-allow; ensure it also removes maas-api-cleanup-restrict to avoid
orphaned policies. Update the kubectl delete invocation that uses "$ns" (the
namespace variable) to include both resource names (maas-authorino-allow and
maas-api-cleanup-restrict) or add a second kubectl delete line for
maas-api-cleanup-restrict with --ignore-not-found so both are cleaned up safely
during legacy/upgrade cleanup.

In `@docs/content/reference/maas-api-overview.md`:
- Line 52: Update the "Called By" column for the `/internal/v1/api-keys/cleanup`
row so it no longer attributes the call to the background goroutine; instead
mark it as an internal/manual trigger only (e.g., "Called By: Manual/internal
trigger only — not invoked by background goroutine"). Ensure the endpoint path
`/internal/v1/api-keys/cleanup` is the row being edited and keep the description
text intact.

In `@maas-api/cmd/main.go`:
- Around line 249-252: The legacy unauthenticated cleanup endpoint is still
registered via internalRoutes.POST("/api-keys/cleanup",
apiKeyHandler.CleanupExpiredEphemeralKeys) and must be removed or protected;
either delete the registration to eliminate the public state-changing handler or
wrap it with proper auth/authorization (e.g., require the same
Authorino/internal-only middleware used for other internal routes or enforce a
localhost/internal-network check) so only trusted in-cluster callers can invoke
apiKeyHandler.CleanupExpiredEphemeralKeys; update router.Group("/internal/v1")
wiring accordingly and ensure tests/clients that relied on this endpoint are
adjusted.

In `@maas-api/internal/config/config.go`:
- Around line 174-176: The config currently treats any negative
CleanupIntervalMinutes as “disabled”; change the validation around
c.CleanupIntervalMinutes so values < -1 are rejected instead of silently
accepted: add a guard that if c.CleanupIntervalMinutes < -1 the config
load/validation returns an error (with a clear message mentioning
CleanupIntervalMinutes), keep the existing special-case for -1 (disabled) and
the defaulting for 0 to constant.DefaultCleanupIntervalMinutes so only values
below -1 cause failure; update the function/method that initializes/validates
the config where c.CleanupIntervalMinutes is handled.

In `@maas-controller/pkg/platform/tenantreconcile/prune.go`:
- Around line 45-52: The code currently deletes legacy resources if they lack
opendatahub.io/managed=false; change the predicate to require positive
ownership/management before calling c.Delete — i.e., replace the simple negation
check using isManagedForPrune/AnnotationManaged with a stricter test that
returns true only if the object either has a MaaS owner reference (inspect
obj.GetOwnerReferences() for a controller owner matching the MaaS controller
UID/Kind) or carries an explicit positive management marker (e.g.,
opendatahub.io/managed=true or a dedicated controller label). Update the prune
sites that call isManagedForPrune (the block invoking c.Delete and the similar
block at 57-63) to use this new predicate (implement a helper like
hasMaaSOwnerOrManagedTrue and use it in place of the current check) so only
resources owned/marked by MaaS are removed.

In `@test/e2e/tests/test_api_keys.py`:
- Around line 931-934: The test currently asserts CLEANUP_INTERVAL_MINUTES
equals the hard-coded string "15"; change it to only assert the environment
variable exists and is parseable as an integer (or compare it against a
test-provided expected value). Specifically, update the assertions around
env_vars["CLEANUP_INTERVAL_MINUTES"] so you first assert
"CLEANUP_INTERVAL_MINUTES" in env_vars, then attempt to cast
env_vars["CLEANUP_INTERVAL_MINUTES"] to int (or retrieve an expected value from
the test config/fixture) and assert that the cast succeeds (or equals the
expected value) instead of requiring the literal "15"; refer to the
CLEANUP_INTERVAL_MINUTES lookup in this test to locate where to change the
checks.

---

Nitpick comments:
In `@maas-controller/pkg/platform/tenantreconcile/prune_test.go`:
- Around line 34-46: Add a second subtest mirroring the CronJob case that
ensures a NetworkPolicy with annotation opendatahub.io/managed="false" is
skipped: create a resource via newLegacyResource using GVKNetworkPolicy and an
appropriate name/namespace, seed it into the fake client (like the cronJob
test), call PruneLegacyCleanupResources with the same context/logr.Discard() and
namespace, then assert that the object still exists by fetching it from the
client (use client.ObjectKey with the NetworkPolicy name and appNs) and
requiring no error; reference GVKNetworkPolicy and newLegacyResource to locate
where to add this test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 72114349-c591-49f2-8246-90d63a5d83dc

📥 Commits

Reviewing files that changed from the base of the PR and between 06ffb2d and 0e548b2.

📒 Files selected for processing (32)
  • .github/hack/cleanup-odh.sh
  • .github/workflows/maas-controller-ci.yml
  • deployment/base/maas-api/core/cronjob-cleanup.yaml
  • deployment/base/maas-api/core/deployment.yaml
  • deployment/base/maas-api/core/kustomization.yaml
  • deployment/base/maas-api/core/networkpolicy-cleanup.yaml
  • deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
  • deployment/base/maas-api/overlays/tls/kustomization.yaml
  • deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
  • deployment/base/maas-controller/default/params.env
  • deployment/base/maas-controller/manager/manager.yaml
  • deployment/base/maas-controller/rbac/clusterrole.yaml
  • deployment/overlays/odh/params.env
  • docs/content/configuration-and-management/api-key-administration.md
  • docs/content/reference/maas-api-overview.md
  • docs/content/user-guide/api-key-management.md
  • maas-api/cmd/main.go
  • maas-api/internal/api_keys/handler.go
  • maas-api/internal/api_keys/service.go
  • maas-api/internal/config/config.go
  • maas-api/internal/config/config_test.go
  • maas-api/internal/constant/const.go
  • maas-controller/pkg/controller/maas/tenant_controller.go
  • maas-controller/pkg/platform/tenantreconcile/constants.go
  • maas-controller/pkg/platform/tenantreconcile/params.go
  • maas-controller/pkg/platform/tenantreconcile/params_test.go
  • maas-controller/pkg/platform/tenantreconcile/pipeline.go
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • maas-controller/pkg/platform/tenantreconcile/prune_test.go
  • scripts/deploy.sh
  • test/e2e/scripts/local-deploy.sh
  • test/e2e/tests/test_api_keys.py
💤 Files with no reviewable changes (14)
  • deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
  • deployment/overlays/odh/params.env
  • deployment/base/maas-api/overlays/tls/kustomization.yaml
  • deployment/base/maas-api/core/networkpolicy-cleanup.yaml
  • .github/workflows/maas-controller-ci.yml
  • deployment/base/maas-controller/rbac/clusterrole.yaml
  • deployment/base/maas-api/core/cronjob-cleanup.yaml
  • maas-controller/pkg/platform/tenantreconcile/params.go
  • deployment/base/maas-api/core/kustomization.yaml
  • deployment/base/maas-controller/manager/manager.yaml
  • test/e2e/scripts/local-deploy.sh
  • deployment/base/maas-controller/default/params.env
  • scripts/deploy.sh
  • deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml

kubectl delete destinationrule maas-api-backend-tls -n "$ns" --ignore-not-found 2>/dev/null || true
kubectl delete networkpolicy maas-api-cleanup-restrict maas-authorino-allow -n "$ns" --ignore-not-found 2>/dev/null || true
kubectl delete cronjob maas-api-key-cleanup -n "$ns" --ignore-not-found 2>/dev/null || true
kubectl delete networkpolicy maas-authorino-allow -n "$ns" --ignore-not-found 2>/dev/null || true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep deletion for legacy cleanup NetworkPolicy to avoid stale leftovers.

Line 103 now deletes only maas-authorino-allow. In upgrade/partial-cleanup clusters, maas-api-cleanup-restrict can still exist and remain orphaned. Delete both names in this step.

Proposed patch
-    kubectl delete networkpolicy maas-authorino-allow -n "$ns" --ignore-not-found 2>/dev/null || true
+    kubectl delete networkpolicy maas-authorino-allow maas-api-cleanup-restrict -n "$ns" --ignore-not-found 2>/dev/null || true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kubectl delete networkpolicy maas-authorino-allow -n "$ns" --ignore-not-found 2>/dev/null || true
kubectl delete networkpolicy maas-authorino-allow maas-api-cleanup-restrict -n "$ns" --ignore-not-found 2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/hack/cleanup-odh.sh at line 103, The script currently only deletes
the NetworkPolicy named maas-authorino-allow; ensure it also removes
maas-api-cleanup-restrict to avoid orphaned policies. Update the kubectl delete
invocation that uses "$ns" (the namespace variable) to include both resource
names (maas-authorino-allow and maas-api-cleanup-restrict) or add a second
kubectl delete line for maas-api-cleanup-restrict with --ignore-not-found so
both are cleaned up safely during legacy/upgrade cleanup.

Comment thread docs/content/reference/maas-api-overview.md
Comment thread maas-api/cmd/main.go
Comment on lines +249 to +252
// Internal routes (no auth required - called by Authorino / in-process cleanup)
internalRoutes := router.Group("/internal/v1")
internalRoutes.POST("/api-keys/validate", apiKeyHandler.ValidateAPIKeyHandler)
internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys)
internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys) // TODO: consider remove endpoint if not public access

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove or protect the legacy cleanup endpoint.

This PR moves cleanup in-process, but Line 252 still registers an unauthenticated state-changing handler on the shared maas-api service. With the dedicated cleanup NetworkPolicy being removed elsewhere in this change, any in-cluster workload that can reach the Service can repeatedly trigger delete queries against expired keys (CWE-306, CWE-862).

Suggested fix
 	// Internal routes (no auth required - called by Authorino / in-process cleanup)
 	internalRoutes := router.Group("/internal/v1")
 	internalRoutes.POST("/api-keys/validate", apiKeyHandler.ValidateAPIKeyHandler)
-	internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys) // TODO: consider remove endpoint if not public access
 	internalRoutes.POST("/subscriptions/select", subscriptionHandler.SelectSubscription)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Internal routes (no auth required - called by Authorino / in-process cleanup)
internalRoutes := router.Group("/internal/v1")
internalRoutes.POST("/api-keys/validate", apiKeyHandler.ValidateAPIKeyHandler)
internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys)
internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys) // TODO: consider remove endpoint if not public access
// Internal routes (no auth required - called by Authorino / in-process cleanup)
internalRoutes := router.Group("/internal/v1")
internalRoutes.POST("/api-keys/validate", apiKeyHandler.ValidateAPIKeyHandler)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maas-api/cmd/main.go` around lines 249 - 252, The legacy unauthenticated
cleanup endpoint is still registered via
internalRoutes.POST("/api-keys/cleanup",
apiKeyHandler.CleanupExpiredEphemeralKeys) and must be removed or protected;
either delete the registration to eliminate the public state-changing handler or
wrap it with proper auth/authorization (e.g., require the same
Authorino/internal-only middleware used for other internal routes or enforce a
localhost/internal-network check) so only trusted in-cluster callers can invoke
apiKeyHandler.CleanupExpiredEphemeralKeys; update router.Group("/internal/v1")
wiring accordingly and ensure tests/clients that relied on this endpoint are
adjusted.

Comment on lines +174 to +176
if c.CleanupIntervalMinutes == 0 {
c.CleanupIntervalMinutes = constant.DefaultCleanupIntervalMinutes
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject cleanup intervals below -1.

These lines silently treat every negative value as “disabled”, even though the public contract says only -1 disables cleanup. A typo like -15 will turn pruning off instead of failing fast.

Suggested fix
 	if c.CleanupIntervalMinutes == 0 {
 		c.CleanupIntervalMinutes = constant.DefaultCleanupIntervalMinutes
 	}
+	if c.CleanupIntervalMinutes < -1 {
+		return errors.New("CLEANUP_INTERVAL_MINUTES must be -1, 0, or a positive integer")
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if c.CleanupIntervalMinutes == 0 {
c.CleanupIntervalMinutes = constant.DefaultCleanupIntervalMinutes
}
if c.CleanupIntervalMinutes == 0 {
c.CleanupIntervalMinutes = constant.DefaultCleanupIntervalMinutes
}
if c.CleanupIntervalMinutes < -1 {
return errors.New("CLEANUP_INTERVAL_MINUTES must be -1, 0, or a positive integer")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maas-api/internal/config/config.go` around lines 174 - 176, The config
currently treats any negative CleanupIntervalMinutes as “disabled”; change the
validation around c.CleanupIntervalMinutes so values < -1 are rejected instead
of silently accepted: add a guard that if c.CleanupIntervalMinutes < -1 the
config load/validation returns an error (with a clear message mentioning
CleanupIntervalMinutes), keep the existing special-case for -1 (disabled) and
the defaulting for 0 to constant.DefaultCleanupIntervalMinutes so only values
below -1 cause failure; update the function/method that initializes/validates
the config where c.CleanupIntervalMinutes is handled.

Comment on lines +45 to +52
if !isManagedForPrune(obj) {
log.V(1).Info("Skipping legacy resource prune: annotation "+AnnotationManaged+":false",
"kind", kind, "name", name, "namespace", namespace)
return nil
}
log.Info("Deleting legacy platform resource", "kind", kind, "name", name, "namespace", namespace)
if err := c.Delete(ctx, obj); err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("delete legacy %s/%s in namespace %s: %w", kind, name, namespace, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Require a positive ownership/management check before deleting these resources.

Right now any CronJob/NetworkPolicy with the legacy name in appNs is deleted unless it explicitly sets opendatahub.io/managed=false. That makes name collision enough to trigger deletion during reconcile. In practice, a manually managed object or a third-party resource with the same name gets removed even though the controller never created it.

Only prune objects that carry a positive controller-management marker or owner reference that identifies them as MaaS-managed legacy operands; fixed names plus an opt-out annotation are not a safe delete predicate.

Also applies to: 57-63

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maas-controller/pkg/platform/tenantreconcile/prune.go` around lines 45 - 52,
The code currently deletes legacy resources if they lack
opendatahub.io/managed=false; change the predicate to require positive
ownership/management before calling c.Delete — i.e., replace the simple negation
check using isManagedForPrune/AnnotationManaged with a stricter test that
returns true only if the object either has a MaaS owner reference (inspect
obj.GetOwnerReferences() for a controller owner matching the MaaS controller
UID/Kind) or carries an explicit positive management marker (e.g.,
opendatahub.io/managed=true or a dedicated controller label). Update the prune
sites that call isManagedForPrune (the block invoking c.Delete and the similar
block at 57-63) to use this new predicate (implement a helper like
hasMaaSOwnerOrManagedTrue and use it in place of the current check) so only
resources owned/marked by MaaS are removed.

Comment on lines +931 to +934
assert "CLEANUP_INTERVAL_MINUTES" in env_vars, \
"maas-api should have CLEANUP_INTERVAL_MINUTES env var"
assert env_vars["CLEANUP_INTERVAL_MINUTES"] == "15", \
f"Expected CLEANUP_INTERVAL_MINUTES=15, got {env_vars['CLEANUP_INTERVAL_MINUTES']!r}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t hard-code the default cleanup interval in this E2E.

The PR makes CLEANUP_INTERVAL_MINUTES configurable, but this test fails any valid deployment that sets a non-default value. Assert that the env var is present and parseable, or compare it to an E2E-provided expected value, instead of requiring "15" unconditionally.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/tests/test_api_keys.py` around lines 931 - 934, The test currently
asserts CLEANUP_INTERVAL_MINUTES equals the hard-coded string "15"; change it to
only assert the environment variable exists and is parseable as an integer (or
compare it against a test-provided expected value). Specifically, update the
assertions around env_vars["CLEANUP_INTERVAL_MINUTES"] so you first assert
"CLEANUP_INTERVAL_MINUTES" in env_vars, then attempt to cast
env_vars["CLEANUP_INTERVAL_MINUTES"] to int (or retrieve an expected value from
the test config/fixture) and assert that the cast succeeds (or equals the
expected value) instead of requiring the literal "15"; refer to the
CLEANUP_INTERVAL_MINUTES lookup in this test to locate where to change the
checks.

@zdtsw zdtsw force-pushed the fix_cleanup_cronjob2 branch from 0e548b2 to 4c87657 Compare June 8, 2026 07:59

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/tests/test_api_keys.py`:
- Around line 926-930: Replace the current lookup that falls back to
containers[0] with a strict existence check: find the maas-api container using
the generator expression used for maas_api_container and if not found raise an
explicit error (or assert) instead of defaulting; then build env_vars from that
maas_api_container's "env" field as before (the code referencing
maas_api_container and env_vars should remain, only remove the fallback and add
a fail-fast raise when the maas-api container is missing).
- Around line 1318-1327: The patch currently always invokes subprocess.run with
patch_ops even when patch_ops is empty; update the logic in the block that
builds patch_ops (variables: patch_ops, webhook_config, webhook_name and the
loop over webhook_config.get("webhooks", [])) to skip calling subprocess.run
unless patch_ops contains at least one operation (e.g., check if patch_ops
before calling subprocess.run) so you only apply the JSON patch when there are
matching webhooks to modify.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 79695628-6705-4fe3-8073-406ad1050b8b

📥 Commits

Reviewing files that changed from the base of the PR and between 0e548b2 and 4c87657.

📒 Files selected for processing (32)
  • .github/hack/cleanup-odh.sh
  • .github/workflows/maas-controller-ci.yml
  • deployment/base/maas-api/core/cronjob-cleanup.yaml
  • deployment/base/maas-api/core/deployment.yaml
  • deployment/base/maas-api/core/kustomization.yaml
  • deployment/base/maas-api/core/networkpolicy-cleanup.yaml
  • deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
  • deployment/base/maas-api/overlays/tls/kustomization.yaml
  • deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
  • deployment/base/maas-controller/default/params.env
  • deployment/base/maas-controller/manager/manager.yaml
  • deployment/base/maas-controller/rbac/clusterrole.yaml
  • deployment/overlays/odh/params.env
  • docs/content/configuration-and-management/api-key-administration.md
  • docs/content/reference/maas-api-overview.md
  • docs/content/user-guide/api-key-management.md
  • maas-api/cmd/main.go
  • maas-api/internal/api_keys/handler.go
  • maas-api/internal/api_keys/service.go
  • maas-api/internal/config/config.go
  • maas-api/internal/config/config_test.go
  • maas-api/internal/constant/const.go
  • maas-controller/pkg/controller/maas/tenant_controller.go
  • maas-controller/pkg/platform/tenantreconcile/constants.go
  • maas-controller/pkg/platform/tenantreconcile/params.go
  • maas-controller/pkg/platform/tenantreconcile/params_test.go
  • maas-controller/pkg/platform/tenantreconcile/pipeline.go
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • maas-controller/pkg/platform/tenantreconcile/prune_test.go
  • scripts/deploy.sh
  • test/e2e/scripts/local-deploy.sh
  • test/e2e/tests/test_api_keys.py
💤 Files with no reviewable changes (11)
  • .github/workflows/maas-controller-ci.yml
  • deployment/base/maas-api/core/kustomization.yaml
  • deployment/base/maas-api/core/networkpolicy-cleanup.yaml
  • deployment/base/maas-api/core/cronjob-cleanup.yaml
  • deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
  • deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
  • deployment/base/maas-controller/default/params.env
  • test/e2e/scripts/local-deploy.sh
  • deployment/base/maas-api/overlays/tls/kustomization.yaml
  • scripts/deploy.sh
  • deployment/overlays/odh/params.env
✅ Files skipped from review due to trivial changes (6)
  • maas-api/internal/api_keys/service.go
  • maas-api/internal/constant/const.go
  • docs/content/user-guide/api-key-management.md
  • docs/content/reference/maas-api-overview.md
  • maas-api/internal/api_keys/handler.go
  • docs/content/configuration-and-management/api-key-administration.md
🚧 Files skipped from review as they are similar to previous changes (13)
  • maas-controller/pkg/platform/tenantreconcile/pipeline.go
  • .github/hack/cleanup-odh.sh
  • maas-api/internal/config/config_test.go
  • deployment/base/maas-controller/rbac/clusterrole.yaml
  • maas-api/internal/config/config.go
  • maas-controller/pkg/platform/tenantreconcile/prune_test.go
  • maas-controller/pkg/controller/maas/tenant_controller.go
  • deployment/base/maas-controller/manager/manager.yaml
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • maas-controller/pkg/platform/tenantreconcile/params_test.go
  • maas-api/cmd/main.go
  • maas-controller/pkg/platform/tenantreconcile/constants.go
  • maas-controller/pkg/platform/tenantreconcile/params.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/tests/test_api_keys.py`:
- Around line 926-930: Replace the current lookup that falls back to
containers[0] with a strict existence check: find the maas-api container using
the generator expression used for maas_api_container and if not found raise an
explicit error (or assert) instead of defaulting; then build env_vars from that
maas_api_container's "env" field as before (the code referencing
maas_api_container and env_vars should remain, only remove the fallback and add
a fail-fast raise when the maas-api container is missing).
- Around line 1318-1327: The patch currently always invokes subprocess.run with
patch_ops even when patch_ops is empty; update the logic in the block that
builds patch_ops (variables: patch_ops, webhook_config, webhook_name and the
loop over webhook_config.get("webhooks", [])) to skip calling subprocess.run
unless patch_ops contains at least one operation (e.g., check if patch_ops
before calling subprocess.run) so you only apply the JSON patch when there are
matching webhooks to modify.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 79695628-6705-4fe3-8073-406ad1050b8b

📥 Commits

Reviewing files that changed from the base of the PR and between 0e548b2 and 4c87657.

📒 Files selected for processing (32)
  • .github/hack/cleanup-odh.sh
  • .github/workflows/maas-controller-ci.yml
  • deployment/base/maas-api/core/cronjob-cleanup.yaml
  • deployment/base/maas-api/core/deployment.yaml
  • deployment/base/maas-api/core/kustomization.yaml
  • deployment/base/maas-api/core/networkpolicy-cleanup.yaml
  • deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
  • deployment/base/maas-api/overlays/tls/kustomization.yaml
  • deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
  • deployment/base/maas-controller/default/params.env
  • deployment/base/maas-controller/manager/manager.yaml
  • deployment/base/maas-controller/rbac/clusterrole.yaml
  • deployment/overlays/odh/params.env
  • docs/content/configuration-and-management/api-key-administration.md
  • docs/content/reference/maas-api-overview.md
  • docs/content/user-guide/api-key-management.md
  • maas-api/cmd/main.go
  • maas-api/internal/api_keys/handler.go
  • maas-api/internal/api_keys/service.go
  • maas-api/internal/config/config.go
  • maas-api/internal/config/config_test.go
  • maas-api/internal/constant/const.go
  • maas-controller/pkg/controller/maas/tenant_controller.go
  • maas-controller/pkg/platform/tenantreconcile/constants.go
  • maas-controller/pkg/platform/tenantreconcile/params.go
  • maas-controller/pkg/platform/tenantreconcile/params_test.go
  • maas-controller/pkg/platform/tenantreconcile/pipeline.go
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • maas-controller/pkg/platform/tenantreconcile/prune_test.go
  • scripts/deploy.sh
  • test/e2e/scripts/local-deploy.sh
  • test/e2e/tests/test_api_keys.py
💤 Files with no reviewable changes (11)
  • .github/workflows/maas-controller-ci.yml
  • deployment/base/maas-api/core/kustomization.yaml
  • deployment/base/maas-api/core/networkpolicy-cleanup.yaml
  • deployment/base/maas-api/core/cronjob-cleanup.yaml
  • deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
  • deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
  • deployment/base/maas-controller/default/params.env
  • test/e2e/scripts/local-deploy.sh
  • deployment/base/maas-api/overlays/tls/kustomization.yaml
  • scripts/deploy.sh
  • deployment/overlays/odh/params.env
✅ Files skipped from review due to trivial changes (6)
  • maas-api/internal/api_keys/service.go
  • maas-api/internal/constant/const.go
  • docs/content/user-guide/api-key-management.md
  • docs/content/reference/maas-api-overview.md
  • maas-api/internal/api_keys/handler.go
  • docs/content/configuration-and-management/api-key-administration.md
🚧 Files skipped from review as they are similar to previous changes (13)
  • maas-controller/pkg/platform/tenantreconcile/pipeline.go
  • .github/hack/cleanup-odh.sh
  • maas-api/internal/config/config_test.go
  • deployment/base/maas-controller/rbac/clusterrole.yaml
  • maas-api/internal/config/config.go
  • maas-controller/pkg/platform/tenantreconcile/prune_test.go
  • maas-controller/pkg/controller/maas/tenant_controller.go
  • deployment/base/maas-controller/manager/manager.yaml
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • maas-controller/pkg/platform/tenantreconcile/params_test.go
  • maas-api/cmd/main.go
  • maas-controller/pkg/platform/tenantreconcile/constants.go
  • maas-controller/pkg/platform/tenantreconcile/params.go
🛑 Comments failed to post (2)
test/e2e/tests/test_api_keys.py (2)

926-930: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Require explicit maas-api container match instead of falling back to containers[0].

This fallback can validate env vars on the wrong container and produce false pass/fail outcomes if container ordering changes. Fail fast when maas-api is absent.

Proposed fix
-        maas_api_container = next(
-            (c for c in containers if c.get("name") == "maas-api"),
-            containers[0] if containers else {},
-        )
+        maas_api_container = next(
+            (c for c in containers if c.get("name") == "maas-api"),
+            None,
+        )
+        assert maas_api_container is not None, "maas-api container not found in Deployment spec"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/tests/test_api_keys.py` around lines 926 - 930, Replace the current
lookup that falls back to containers[0] with a strict existence check: find the
maas-api container using the generator expression used for maas_api_container
and if not found raise an explicit error (or assert) instead of defaulting; then
build env_vars from that maas_api_container's "env" field as before (the code
referencing maas_api_container and env_vars should remain, only remove the
fallback and add a fail-fast raise when the maas-api container is missing).

1318-1327: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against empty webhook patch operations before applying Ignore.

If no matching webhook names are found, the patch step can become a no-op and the test proceeds with failurePolicy unchanged, causing brittle failures after controller scale-down.

Proposed fix
             patch_ops = []
             for idx, webhook in enumerate(webhook_config.get("webhooks", [])):
                 if webhook.get("name") in ["vmaassubscription.kb.io", "vmaasauthpolicy.kb.io"]:
                     patch_ops.append({"op": "replace", "path": f"/webhooks/{idx}/failurePolicy", "value": "Ignore"})
 
+            if not patch_ops:
+                pytest.skip(
+                    f"No matching webhooks found in {webhook_name}; cannot set failurePolicy=Ignore"
+                )
+
             subprocess.run(
                 ["oc", "patch", "validatingwebhookconfiguration", webhook_name,
                  "--type=json", "-p", json.dumps(patch_ops)],
                 capture_output=True, text=True, check=True
             )
🧰 Tools
🪛 Ruff (0.15.15)

[error] 1323-1323: subprocess call: check for execution of untrusted input

(S603)


[error] 1324-1325: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/tests/test_api_keys.py` around lines 1318 - 1327, The patch
currently always invokes subprocess.run with patch_ops even when patch_ops is
empty; update the logic in the block that builds patch_ops (variables:
patch_ops, webhook_config, webhook_name and the loop over
webhook_config.get("webhooks", [])) to skip calling subprocess.run unless
patch_ops contains at least one operation (e.g., check if patch_ops before
calling subprocess.run) so you only apply the JSON patch when there are matching
webhooks to modify.

@zdtsw zdtsw force-pushed the fix_cleanup_cronjob2 branch from 4c87657 to d868fc2 Compare June 8, 2026 08:09

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@maas-controller/pkg/platform/tenantreconcile/params.go`:
- Around line 140-148: patchPreProcessingDeployment is missing observability and
the common signature pattern: add a logr.Logger parameter to
patchPreProcessingDeployment(r *unstructured.Unstructured, params
PlatformParams, log logr.Logger), mirror other patch functions, and emit a V(4)
log when you call setContainerImage for PayloadPreProcessingName (include the
image value and context); update all call sites that invoke
patchPreProcessingDeployment to pass the logger and keep error wrapping as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b8a2beb6-f5f0-4e76-ba66-fe5b875344c5

📥 Commits

Reviewing files that changed from the base of the PR and between 4c87657 and d868fc2.

📒 Files selected for processing (7)
  • docs/content/configuration-and-management/api-key-administration.md
  • docs/content/reference/maas-api-overview.md
  • maas-api/internal/api_keys/handler.go
  • maas-api/internal/api_keys/service.go
  • maas-controller/pkg/platform/tenantreconcile/params.go
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • test/e2e/tests/test_api_keys.py
✅ Files skipped from review due to trivial changes (4)
  • docs/content/reference/maas-api-overview.md
  • maas-api/internal/api_keys/service.go
  • docs/content/configuration-and-management/api-key-administration.md
  • maas-api/internal/api_keys/handler.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • test/e2e/tests/test_api_keys.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@maas-controller/pkg/platform/tenantreconcile/params.go`:
- Around line 140-148: patchPreProcessingDeployment is missing observability and
the common signature pattern: add a logr.Logger parameter to
patchPreProcessingDeployment(r *unstructured.Unstructured, params
PlatformParams, log logr.Logger), mirror other patch functions, and emit a V(4)
log when you call setContainerImage for PayloadPreProcessingName (include the
image value and context); update all call sites that invoke
patchPreProcessingDeployment to pass the logger and keep error wrapping as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b8a2beb6-f5f0-4e76-ba66-fe5b875344c5

📥 Commits

Reviewing files that changed from the base of the PR and between 4c87657 and d868fc2.

📒 Files selected for processing (7)
  • docs/content/configuration-and-management/api-key-administration.md
  • docs/content/reference/maas-api-overview.md
  • maas-api/internal/api_keys/handler.go
  • maas-api/internal/api_keys/service.go
  • maas-controller/pkg/platform/tenantreconcile/params.go
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • test/e2e/tests/test_api_keys.py
✅ Files skipped from review due to trivial changes (4)
  • docs/content/reference/maas-api-overview.md
  • maas-api/internal/api_keys/service.go
  • docs/content/configuration-and-management/api-key-administration.md
  • maas-api/internal/api_keys/handler.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • test/e2e/tests/test_api_keys.py
🛑 Comments failed to post (1)
maas-controller/pkg/platform/tenantreconcile/params.go (1)

140-148: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing observability and inconsistent pattern.

All other deployment-patching functions (patchMaaSAPIDeployment, patchPayloadProcessingDeployment) include a log logr.Logger parameter and emit V(4)-level logs for image patching. This function omits both, creating an observability gap for debugging image propagation.

📝 Proposed fix to add logging
-func patchPreProcessingDeployment(r *unstructured.Unstructured, params PlatformParams) error {
+func patchPreProcessingDeployment(log logr.Logger, r *unstructured.Unstructured, params PlatformParams) error {
 	r.SetNamespace(params.GatewayNamespace)
 	if params.PayloadProcessingImage != "" {
+		log.V(4).Info("Patching payload-pre-processing image", "image", params.PayloadProcessingImage)
 		if err := setContainerImage(r, PayloadPreProcessingName, params.PayloadProcessingImage); err != nil {
 			return fmt.Errorf("patch payload-pre-processing image: %w", err)
 		}
 	}
 	return nil
 }

Update the call site at line 96:

 		case gvk == GVKDeployment && name == PayloadPreProcessingName:
-			if err := patchPreProcessingDeployment(r, params); err != nil {
+			if err := patchPreProcessingDeployment(log, r, params); err != nil {
 				return err
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maas-controller/pkg/platform/tenantreconcile/params.go` around lines 140 -
148, patchPreProcessingDeployment is missing observability and the common
signature pattern: add a logr.Logger parameter to patchPreProcessingDeployment(r
*unstructured.Unstructured, params PlatformParams, log logr.Logger), mirror
other patch functions, and emit a V(4) log when you call setContainerImage for
PayloadPreProcessingName (include the image value and context); update all call
sites that invoke patchPreProcessingDeployment to pass the logger and keep error
wrapping as-is.

zdtsw and others added 9 commits June 15, 2026 07:33
- save resource, no spin up job every 15mins
- less image to pull down for disconnect
- should cleanup endpoint later if needed
- less security concern as one less networkpolicy and patch wors

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- for race condition case, use sync.WaitGroup to let goroutine finish

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
This reverts commit 08e67f1.
- either pass as flag --cleanup-internval to main or env variable CLEANUP_INTERNVAL_MINUTES in
  deployment for customized value
- if not set any or set to '0', use default 15mins
- if set to a negative value, skip cleanup goroutine totally

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
…onJob

Remove stale CronJob patching, deploy params, and RBAC from opendatahub-io#934 fallout.
Add PruneLegacyCleanupResources on tenant reconcile, run maas-controller
CI on every PR, and update docs/E2E for CLEANUP_INTERVAL_MINUTES.
- fix rebase as well

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
@zdtsw zdtsw force-pushed the fix_cleanup_cronjob2 branch from d868fc2 to e8a1e51 Compare June 15, 2026 06:05

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
maas-controller/pkg/controller/maas/tenant_controller.go (1)

82-93: Remove duplicate CronJob RBAC annotation for code clarity.

Line 92 duplicates the CronJob annotation from Line 82 with broader permissions (get;list;watch;create;patch;delete vs. get;delete). The rendered ClusterRole correctly reflects only the narrower get;delete permissions, but the duplicate annotation introduces code maintenance risk and potential for future mistakes. Remove Line 92 to maintain a single, minimal RBAC declaration.

Remediation
 // +kubebuilder:rbac:groups=telemetry.istio.io,resources=telemetries,verbs=get;list;watch;create;patch;delete
-// +kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;patch;delete
 // +kubebuilder:rbac:groups=monitoring.coreos.com,resources=podmonitors;servicemonitors,verbs=get;list;watch;create;patch;delete
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maas-controller/pkg/controller/maas/tenant_controller.go` around lines 82 -
93, Remove the duplicate kubebuilder RBAC annotation for batch/cronjobs
resources that grants broader permissions (get;list;watch;create;patch;delete).
Keep only the first, minimal declaration that specifies the narrower get;delete
verbs for cronjobs. The duplicate annotation creates maintenance risk and
unnecessarily duplicates the resource rule, so delete the second cronjobs RBAC
annotation that contains the expanded permission set.

Source: Coding guidelines

test/e2e/tests/test_api_keys.py (1)

918-922: ⚡ Quick win

Add a timeout to the oc get deploy subprocess call.

Line 918 can block indefinitely on cluster/API stalls, which makes CI jobs hang unnecessarily.

Proposed change
         result = sp.run(
             ["oc", "get", "deploy", "maas-api",
              "-n", deployment_namespace, "-o", "json"],
             capture_output=True, text=True,
+            timeout=30,
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/tests/test_api_keys.py` around lines 918 - 922, The sp.run() call
executing the "oc get deploy maas-api" command lacks a timeout parameter, which
allows it to block indefinitely if the cluster or API experiences stalls,
causing CI jobs to hang. Add a timeout parameter to the sp.run() call to prevent
indefinite blocking and ensure the subprocess terminates after a reasonable time
period if the command does not complete.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deployment/overlays/odh/params.env`:
- Line 3: The payload-processing-image reference in params.env uses a tag
reference format (with `:`) which is mutable and vulnerable to tag-retargeting
attacks. Replace the tag-based image reference with a digest-based reference
using the `@sha256:` format instead of the `:` tag notation. You will need to
obtain the actual SHA256 digest of the image from the container registry and
update the reference to use the immutable digest format, for example:
quay.io/opendatahub/odh-ai-gateway-payload-processing@sha256:[actual-digest-here].
This ensures the deployment always pulls the exact image by its content hash
rather than relying on a mutable tag.

In `@test/e2e/tests/test_api_keys.py`:
- Around line 932-935: The container lookup logic in the next() call is falling
back to containers[0] when the maas-api container is not found, which silently
masks configuration regressions. Remove the fallback logic and instead raise an
explicit error (or fail the test) if the maas-api container is not found in the
containers list. This ensures that missing or misconfigured containers will be
caught rather than silently using a different container for validation.

---

Nitpick comments:
In `@maas-controller/pkg/controller/maas/tenant_controller.go`:
- Around line 82-93: Remove the duplicate kubebuilder RBAC annotation for
batch/cronjobs resources that grants broader permissions
(get;list;watch;create;patch;delete). Keep only the first, minimal declaration
that specifies the narrower get;delete verbs for cronjobs. The duplicate
annotation creates maintenance risk and unnecessarily duplicates the resource
rule, so delete the second cronjobs RBAC annotation that contains the expanded
permission set.

In `@test/e2e/tests/test_api_keys.py`:
- Around line 918-922: The sp.run() call executing the "oc get deploy maas-api"
command lacks a timeout parameter, which allows it to block indefinitely if the
cluster or API experiences stalls, causing CI jobs to hang. Add a timeout
parameter to the sp.run() call to prevent indefinite blocking and ensure the
subprocess terminates after a reasonable time period if the command does not
complete.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 01651a18-c662-4e20-9753-91ea2cd5aa4a

📥 Commits

Reviewing files that changed from the base of the PR and between d868fc2 and e8a1e51.

📒 Files selected for processing (33)
  • .github/hack/cleanup-odh.sh
  • .github/workflows/maas-controller-ci.yml
  • deployment/base/maas-api/core/cronjob-cleanup.yaml
  • deployment/base/maas-api/core/deployment.yaml
  • deployment/base/maas-api/core/kustomization.yaml
  • deployment/base/maas-api/core/networkpolicy-cleanup.yaml
  • deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
  • deployment/base/maas-api/overlays/tls/kustomization.yaml
  • deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
  • deployment/base/maas-controller/default/params.env
  • deployment/base/maas-controller/manager/manager.yaml
  • deployment/base/maas-controller/rbac/clusterrole.yaml
  • deployment/overlays/odh/params.env
  • docs/content/configuration-and-management/api-key-administration.md
  • docs/content/reference/maas-api-overview.md
  • docs/content/user-guide/api-key-management.md
  • maas-api/cmd/main.go
  • maas-api/internal/api_keys/handler.go
  • maas-api/internal/api_keys/service.go
  • maas-api/internal/config/config.go
  • maas-api/internal/config/config_test.go
  • maas-api/internal/constant/const.go
  • maas-controller/pkg/controller/maas/tenant_controller.go
  • maas-controller/pkg/platform/tenantreconcile/constants.go
  • maas-controller/pkg/platform/tenantreconcile/naming_test.go
  • maas-controller/pkg/platform/tenantreconcile/params.go
  • maas-controller/pkg/platform/tenantreconcile/params_test.go
  • maas-controller/pkg/platform/tenantreconcile/pipeline.go
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • maas-controller/pkg/platform/tenantreconcile/prune_test.go
  • scripts/deploy.sh
  • test/e2e/scripts/local-deploy.sh
  • test/e2e/tests/test_api_keys.py
💤 Files with no reviewable changes (6)
  • .github/workflows/maas-controller-ci.yml
  • deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
  • deployment/base/maas-api/core/kustomization.yaml
  • deployment/base/maas-api/core/networkpolicy-cleanup.yaml
  • deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
  • deployment/base/maas-api/core/cronjob-cleanup.yaml
✅ Files skipped from review due to trivial changes (5)
  • maas-api/internal/api_keys/service.go
  • docs/content/user-guide/api-key-management.md
  • docs/content/reference/maas-api-overview.md
  • maas-api/internal/api_keys/handler.go
  • docs/content/configuration-and-management/api-key-administration.md
🚧 Files skipped from review as they are similar to previous changes (17)
  • deployment/base/maas-api/core/deployment.yaml
  • deployment/base/maas-controller/rbac/clusterrole.yaml
  • scripts/deploy.sh
  • deployment/base/maas-controller/manager/manager.yaml
  • maas-controller/pkg/platform/tenantreconcile/params.go
  • maas-api/internal/config/config_test.go
  • maas-api/internal/constant/const.go
  • maas-controller/pkg/platform/tenantreconcile/constants.go
  • test/e2e/scripts/local-deploy.sh
  • maas-controller/pkg/platform/tenantreconcile/params_test.go
  • maas-controller/pkg/platform/tenantreconcile/pipeline.go
  • deployment/base/maas-api/overlays/tls/kustomization.yaml
  • .github/hack/cleanup-odh.sh
  • maas-controller/pkg/platform/tenantreconcile/prune.go
  • maas-api/cmd/main.go
  • maas-api/internal/config/config.go
  • deployment/base/maas-controller/default/params.env

@@ -1,7 +1,6 @@
maas-api-image=quay.io/opendatahub/maas-api:odh-stable
maas-controller-image=quay.io/opendatahub/maas-controller:odh-stable
payload-processing-image=quay.io/opendatahub/odh-ai-gateway-payload-processing:36614760abfa1b3fb2b521a89097bdaf6e0693b5

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin container image by digest to prevent tag-retarget supply-chain attacks (CWE-494).

Line 3 uses a tag reference, which is mutable even when it looks like a commit SHA. Exploit scenario: a compromised registry workflow can retarget the tag to a malicious image without changing this repo.

Remediation diff
-payload-processing-image=quay.io/opendatahub/odh-ai-gateway-payload-processing:36614760abfa1b3fb2b521a89097bdaf6e0693b5
+payload-processing-image=quay.io/opendatahub/odh-ai-gateway-payload-processing@sha256:<immutable_digest>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deployment/overlays/odh/params.env` at line 3, The payload-processing-image
reference in params.env uses a tag reference format (with `:`) which is mutable
and vulnerable to tag-retargeting attacks. Replace the tag-based image reference
with a digest-based reference using the `@sha256:` format instead of the `:` tag
notation. You will need to obtain the actual SHA256 digest of the image from the
container registry and update the reference to use the immutable digest format,
for example:
quay.io/opendatahub/odh-ai-gateway-payload-processing@sha256:[actual-digest-here].
This ensures the deployment always pulls the exact image by its content hash
rather than relying on a mutable tag.

Comment on lines +932 to 935
maas_api_container = next(
(c for c in containers if c.get("name") == "maas-api"),
containers[0] if containers else {},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail when maas-api container is missing instead of falling back to containers[0].

Line 934 silently switches to the first container, which can validate the wrong env block and mask a container-name/config regression.

Proposed change
-        maas_api_container = next(
-            (c for c in containers if c.get("name") == "maas-api"),
-            containers[0] if containers else {},
-        )
+        maas_api_container = next(
+            (c for c in containers if c.get("name") == "maas-api"),
+            None,
+        )
+        assert maas_api_container is not None, \
+            "Deployment spec must contain a container named 'maas-api'"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/tests/test_api_keys.py` around lines 932 - 935, The container lookup
logic in the next() call is falling back to containers[0] when the maas-api
container is not found, which silently masks configuration regressions. Remove
the fallback logic and instead raise an explicit error (or fail the test) if the
maas-api container is not found in the containers list. This ensures that
missing or misconfigured containers will be caught rather than silently using a
different container for validation.

@rhods-ci-bot

Copy link
Copy Markdown

@zdtsw: The following test has Failed:

OCI Artifact Browser URL

View in Artifact Browser

Inspecting Test Artifacts Manually

To inspect your test artifacts manually, follow these steps:

  1. Install ORAS (see the ORAS installation guide).
  2. Download artifacts with the following commands:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:maas-group-test-dt7qg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants