feat: move external cronjob to run timely cleanup to goroutine, part II#954
feat: move external cronjob to run timely cleanup to goroutine, part II#954zdtsw wants to merge 9 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zdtsw The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThis PR migrates ephemeral API key cleanup from an external Kubernetes CronJob to an in-process background goroutine within the Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Security & Logic IssuesCWE-413 (Insufficient Resource Validation) — CWE-367 (Time-of-Check to Time-of-Use in Goroutine Shutdown) — CWE-248 (Silent Failure / Ignored Error Conditions) — Operational Risk: Upgrade Ordering — CWE-269 (Improper Access Control - RBAC Over-restriction) — Supply Chain: Unrelated Changes — 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
maas-controller/pkg/platform/tenantreconcile/prune_test.go (1)
34-46: ⚡ Quick winAdd managed=false coverage for NetworkPolicy skip path.
Lines 34-46 only assert skip behavior for CronJob. Add a matching managed=false case for
GVKNetworkPolicyso regressions in that branch are caught.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maas-controller/pkg/platform/tenantreconcile/prune_test.go` around lines 34 - 46, Add a second subtest mirroring the CronJob case that ensures a NetworkPolicy with annotation opendatahub.io/managed="false" is skipped: create a resource via newLegacyResource using GVKNetworkPolicy and an appropriate name/namespace, seed it into the fake client (like the cronJob test), call PruneLegacyCleanupResources with the same context/logr.Discard() and namespace, then assert that the object still exists by fetching it from the client (use client.ObjectKey with the NetworkPolicy name and appNs) and requiring no error; reference GVKNetworkPolicy and newLegacyResource to locate where to add this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/hack/cleanup-odh.sh:
- Line 103: The script currently only deletes the NetworkPolicy named
maas-authorino-allow; ensure it also removes maas-api-cleanup-restrict to avoid
orphaned policies. Update the kubectl delete invocation that uses "$ns" (the
namespace variable) to include both resource names (maas-authorino-allow and
maas-api-cleanup-restrict) or add a second kubectl delete line for
maas-api-cleanup-restrict with --ignore-not-found so both are cleaned up safely
during legacy/upgrade cleanup.
In `@docs/content/reference/maas-api-overview.md`:
- Line 52: Update the "Called By" column for the `/internal/v1/api-keys/cleanup`
row so it no longer attributes the call to the background goroutine; instead
mark it as an internal/manual trigger only (e.g., "Called By: Manual/internal
trigger only — not invoked by background goroutine"). Ensure the endpoint path
`/internal/v1/api-keys/cleanup` is the row being edited and keep the description
text intact.
In `@maas-api/cmd/main.go`:
- Around line 249-252: The legacy unauthenticated cleanup endpoint is still
registered via internalRoutes.POST("/api-keys/cleanup",
apiKeyHandler.CleanupExpiredEphemeralKeys) and must be removed or protected;
either delete the registration to eliminate the public state-changing handler or
wrap it with proper auth/authorization (e.g., require the same
Authorino/internal-only middleware used for other internal routes or enforce a
localhost/internal-network check) so only trusted in-cluster callers can invoke
apiKeyHandler.CleanupExpiredEphemeralKeys; update router.Group("/internal/v1")
wiring accordingly and ensure tests/clients that relied on this endpoint are
adjusted.
In `@maas-api/internal/config/config.go`:
- Around line 174-176: The config currently treats any negative
CleanupIntervalMinutes as “disabled”; change the validation around
c.CleanupIntervalMinutes so values < -1 are rejected instead of silently
accepted: add a guard that if c.CleanupIntervalMinutes < -1 the config
load/validation returns an error (with a clear message mentioning
CleanupIntervalMinutes), keep the existing special-case for -1 (disabled) and
the defaulting for 0 to constant.DefaultCleanupIntervalMinutes so only values
below -1 cause failure; update the function/method that initializes/validates
the config where c.CleanupIntervalMinutes is handled.
In `@maas-controller/pkg/platform/tenantreconcile/prune.go`:
- Around line 45-52: The code currently deletes legacy resources if they lack
opendatahub.io/managed=false; change the predicate to require positive
ownership/management before calling c.Delete — i.e., replace the simple negation
check using isManagedForPrune/AnnotationManaged with a stricter test that
returns true only if the object either has a MaaS owner reference (inspect
obj.GetOwnerReferences() for a controller owner matching the MaaS controller
UID/Kind) or carries an explicit positive management marker (e.g.,
opendatahub.io/managed=true or a dedicated controller label). Update the prune
sites that call isManagedForPrune (the block invoking c.Delete and the similar
block at 57-63) to use this new predicate (implement a helper like
hasMaaSOwnerOrManagedTrue and use it in place of the current check) so only
resources owned/marked by MaaS are removed.
In `@test/e2e/tests/test_api_keys.py`:
- Around line 931-934: The test currently asserts CLEANUP_INTERVAL_MINUTES
equals the hard-coded string "15"; change it to only assert the environment
variable exists and is parseable as an integer (or compare it against a
test-provided expected value). Specifically, update the assertions around
env_vars["CLEANUP_INTERVAL_MINUTES"] so you first assert
"CLEANUP_INTERVAL_MINUTES" in env_vars, then attempt to cast
env_vars["CLEANUP_INTERVAL_MINUTES"] to int (or retrieve an expected value from
the test config/fixture) and assert that the cast succeeds (or equals the
expected value) instead of requiring the literal "15"; refer to the
CLEANUP_INTERVAL_MINUTES lookup in this test to locate where to change the
checks.
---
Nitpick comments:
In `@maas-controller/pkg/platform/tenantreconcile/prune_test.go`:
- Around line 34-46: Add a second subtest mirroring the CronJob case that
ensures a NetworkPolicy with annotation opendatahub.io/managed="false" is
skipped: create a resource via newLegacyResource using GVKNetworkPolicy and an
appropriate name/namespace, seed it into the fake client (like the cronJob
test), call PruneLegacyCleanupResources with the same context/logr.Discard() and
namespace, then assert that the object still exists by fetching it from the
client (use client.ObjectKey with the NetworkPolicy name and appNs) and
requiring no error; reference GVKNetworkPolicy and newLegacyResource to locate
where to add this test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 72114349-c591-49f2-8246-90d63a5d83dc
📒 Files selected for processing (32)
.github/hack/cleanup-odh.sh.github/workflows/maas-controller-ci.ymldeployment/base/maas-api/core/cronjob-cleanup.yamldeployment/base/maas-api/core/deployment.yamldeployment/base/maas-api/core/kustomization.yamldeployment/base/maas-api/core/networkpolicy-cleanup.yamldeployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yamldeployment/base/maas-api/overlays/tls/kustomization.yamldeployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yamldeployment/base/maas-controller/default/params.envdeployment/base/maas-controller/manager/manager.yamldeployment/base/maas-controller/rbac/clusterrole.yamldeployment/overlays/odh/params.envdocs/content/configuration-and-management/api-key-administration.mddocs/content/reference/maas-api-overview.mddocs/content/user-guide/api-key-management.mdmaas-api/cmd/main.gomaas-api/internal/api_keys/handler.gomaas-api/internal/api_keys/service.gomaas-api/internal/config/config.gomaas-api/internal/config/config_test.gomaas-api/internal/constant/const.gomaas-controller/pkg/controller/maas/tenant_controller.gomaas-controller/pkg/platform/tenantreconcile/constants.gomaas-controller/pkg/platform/tenantreconcile/params.gomaas-controller/pkg/platform/tenantreconcile/params_test.gomaas-controller/pkg/platform/tenantreconcile/pipeline.gomaas-controller/pkg/platform/tenantreconcile/prune.gomaas-controller/pkg/platform/tenantreconcile/prune_test.goscripts/deploy.shtest/e2e/scripts/local-deploy.shtest/e2e/tests/test_api_keys.py
💤 Files with no reviewable changes (14)
- deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
- deployment/overlays/odh/params.env
- deployment/base/maas-api/overlays/tls/kustomization.yaml
- deployment/base/maas-api/core/networkpolicy-cleanup.yaml
- .github/workflows/maas-controller-ci.yml
- deployment/base/maas-controller/rbac/clusterrole.yaml
- deployment/base/maas-api/core/cronjob-cleanup.yaml
- maas-controller/pkg/platform/tenantreconcile/params.go
- deployment/base/maas-api/core/kustomization.yaml
- deployment/base/maas-controller/manager/manager.yaml
- test/e2e/scripts/local-deploy.sh
- deployment/base/maas-controller/default/params.env
- scripts/deploy.sh
- deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
| kubectl delete destinationrule maas-api-backend-tls -n "$ns" --ignore-not-found 2>/dev/null || true | ||
| kubectl delete networkpolicy maas-api-cleanup-restrict maas-authorino-allow -n "$ns" --ignore-not-found 2>/dev/null || true | ||
| kubectl delete cronjob maas-api-key-cleanup -n "$ns" --ignore-not-found 2>/dev/null || true | ||
| kubectl delete networkpolicy maas-authorino-allow -n "$ns" --ignore-not-found 2>/dev/null || true |
There was a problem hiding this comment.
Keep deletion for legacy cleanup NetworkPolicy to avoid stale leftovers.
Line 103 now deletes only maas-authorino-allow. In upgrade/partial-cleanup clusters, maas-api-cleanup-restrict can still exist and remain orphaned. Delete both names in this step.
Proposed patch
- kubectl delete networkpolicy maas-authorino-allow -n "$ns" --ignore-not-found 2>/dev/null || true
+ kubectl delete networkpolicy maas-authorino-allow maas-api-cleanup-restrict -n "$ns" --ignore-not-found 2>/dev/null || true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| kubectl delete networkpolicy maas-authorino-allow -n "$ns" --ignore-not-found 2>/dev/null || true | |
| kubectl delete networkpolicy maas-authorino-allow maas-api-cleanup-restrict -n "$ns" --ignore-not-found 2>/dev/null || true |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/hack/cleanup-odh.sh at line 103, The script currently only deletes
the NetworkPolicy named maas-authorino-allow; ensure it also removes
maas-api-cleanup-restrict to avoid orphaned policies. Update the kubectl delete
invocation that uses "$ns" (the namespace variable) to include both resource
names (maas-authorino-allow and maas-api-cleanup-restrict) or add a second
kubectl delete line for maas-api-cleanup-restrict with --ignore-not-found so
both are cleaned up safely during legacy/upgrade cleanup.
| // Internal routes (no auth required - called by Authorino / in-process cleanup) | ||
| internalRoutes := router.Group("/internal/v1") | ||
| internalRoutes.POST("/api-keys/validate", apiKeyHandler.ValidateAPIKeyHandler) | ||
| internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys) | ||
| internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys) // TODO: consider remove endpoint if not public access |
There was a problem hiding this comment.
Remove or protect the legacy cleanup endpoint.
This PR moves cleanup in-process, but Line 252 still registers an unauthenticated state-changing handler on the shared maas-api service. With the dedicated cleanup NetworkPolicy being removed elsewhere in this change, any in-cluster workload that can reach the Service can repeatedly trigger delete queries against expired keys (CWE-306, CWE-862).
Suggested fix
// Internal routes (no auth required - called by Authorino / in-process cleanup)
internalRoutes := router.Group("/internal/v1")
internalRoutes.POST("/api-keys/validate", apiKeyHandler.ValidateAPIKeyHandler)
- internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys) // TODO: consider remove endpoint if not public access
internalRoutes.POST("/subscriptions/select", subscriptionHandler.SelectSubscription)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Internal routes (no auth required - called by Authorino / in-process cleanup) | |
| internalRoutes := router.Group("/internal/v1") | |
| internalRoutes.POST("/api-keys/validate", apiKeyHandler.ValidateAPIKeyHandler) | |
| internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys) | |
| internalRoutes.POST("/api-keys/cleanup", apiKeyHandler.CleanupExpiredEphemeralKeys) // TODO: consider remove endpoint if not public access | |
| // Internal routes (no auth required - called by Authorino / in-process cleanup) | |
| internalRoutes := router.Group("/internal/v1") | |
| internalRoutes.POST("/api-keys/validate", apiKeyHandler.ValidateAPIKeyHandler) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@maas-api/cmd/main.go` around lines 249 - 252, The legacy unauthenticated
cleanup endpoint is still registered via
internalRoutes.POST("/api-keys/cleanup",
apiKeyHandler.CleanupExpiredEphemeralKeys) and must be removed or protected;
either delete the registration to eliminate the public state-changing handler or
wrap it with proper auth/authorization (e.g., require the same
Authorino/internal-only middleware used for other internal routes or enforce a
localhost/internal-network check) so only trusted in-cluster callers can invoke
apiKeyHandler.CleanupExpiredEphemeralKeys; update router.Group("/internal/v1")
wiring accordingly and ensure tests/clients that relied on this endpoint are
adjusted.
| if c.CleanupIntervalMinutes == 0 { | ||
| c.CleanupIntervalMinutes = constant.DefaultCleanupIntervalMinutes | ||
| } |
There was a problem hiding this comment.
Reject cleanup intervals below -1.
These lines silently treat every negative value as “disabled”, even though the public contract says only -1 disables cleanup. A typo like -15 will turn pruning off instead of failing fast.
Suggested fix
if c.CleanupIntervalMinutes == 0 {
c.CleanupIntervalMinutes = constant.DefaultCleanupIntervalMinutes
}
+ if c.CleanupIntervalMinutes < -1 {
+ return errors.New("CLEANUP_INTERVAL_MINUTES must be -1, 0, or a positive integer")
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if c.CleanupIntervalMinutes == 0 { | |
| c.CleanupIntervalMinutes = constant.DefaultCleanupIntervalMinutes | |
| } | |
| if c.CleanupIntervalMinutes == 0 { | |
| c.CleanupIntervalMinutes = constant.DefaultCleanupIntervalMinutes | |
| } | |
| if c.CleanupIntervalMinutes < -1 { | |
| return errors.New("CLEANUP_INTERVAL_MINUTES must be -1, 0, or a positive integer") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@maas-api/internal/config/config.go` around lines 174 - 176, The config
currently treats any negative CleanupIntervalMinutes as “disabled”; change the
validation around c.CleanupIntervalMinutes so values < -1 are rejected instead
of silently accepted: add a guard that if c.CleanupIntervalMinutes < -1 the
config load/validation returns an error (with a clear message mentioning
CleanupIntervalMinutes), keep the existing special-case for -1 (disabled) and
the defaulting for 0 to constant.DefaultCleanupIntervalMinutes so only values
below -1 cause failure; update the function/method that initializes/validates
the config where c.CleanupIntervalMinutes is handled.
| if !isManagedForPrune(obj) { | ||
| log.V(1).Info("Skipping legacy resource prune: annotation "+AnnotationManaged+":false", | ||
| "kind", kind, "name", name, "namespace", namespace) | ||
| return nil | ||
| } | ||
| log.Info("Deleting legacy platform resource", "kind", kind, "name", name, "namespace", namespace) | ||
| if err := c.Delete(ctx, obj); err != nil && !apierrors.IsNotFound(err) { | ||
| return fmt.Errorf("delete legacy %s/%s in namespace %s: %w", kind, name, namespace, err) |
There was a problem hiding this comment.
Require a positive ownership/management check before deleting these resources.
Right now any CronJob/NetworkPolicy with the legacy name in appNs is deleted unless it explicitly sets opendatahub.io/managed=false. That makes name collision enough to trigger deletion during reconcile. In practice, a manually managed object or a third-party resource with the same name gets removed even though the controller never created it.
Only prune objects that carry a positive controller-management marker or owner reference that identifies them as MaaS-managed legacy operands; fixed names plus an opt-out annotation are not a safe delete predicate.
Also applies to: 57-63
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@maas-controller/pkg/platform/tenantreconcile/prune.go` around lines 45 - 52,
The code currently deletes legacy resources if they lack
opendatahub.io/managed=false; change the predicate to require positive
ownership/management before calling c.Delete — i.e., replace the simple negation
check using isManagedForPrune/AnnotationManaged with a stricter test that
returns true only if the object either has a MaaS owner reference (inspect
obj.GetOwnerReferences() for a controller owner matching the MaaS controller
UID/Kind) or carries an explicit positive management marker (e.g.,
opendatahub.io/managed=true or a dedicated controller label). Update the prune
sites that call isManagedForPrune (the block invoking c.Delete and the similar
block at 57-63) to use this new predicate (implement a helper like
hasMaaSOwnerOrManagedTrue and use it in place of the current check) so only
resources owned/marked by MaaS are removed.
| assert "CLEANUP_INTERVAL_MINUTES" in env_vars, \ | ||
| "maas-api should have CLEANUP_INTERVAL_MINUTES env var" | ||
| assert env_vars["CLEANUP_INTERVAL_MINUTES"] == "15", \ | ||
| f"Expected CLEANUP_INTERVAL_MINUTES=15, got {env_vars['CLEANUP_INTERVAL_MINUTES']!r}" |
There was a problem hiding this comment.
Don’t hard-code the default cleanup interval in this E2E.
The PR makes CLEANUP_INTERVAL_MINUTES configurable, but this test fails any valid deployment that sets a non-default value. Assert that the env var is present and parseable, or compare it to an E2E-provided expected value, instead of requiring "15" unconditionally.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/tests/test_api_keys.py` around lines 931 - 934, The test currently
asserts CLEANUP_INTERVAL_MINUTES equals the hard-coded string "15"; change it to
only assert the environment variable exists and is parseable as an integer (or
compare it against a test-provided expected value). Specifically, update the
assertions around env_vars["CLEANUP_INTERVAL_MINUTES"] so you first assert
"CLEANUP_INTERVAL_MINUTES" in env_vars, then attempt to cast
env_vars["CLEANUP_INTERVAL_MINUTES"] to int (or retrieve an expected value from
the test config/fixture) and assert that the cast succeeds (or equals the
expected value) instead of requiring the literal "15"; refer to the
CLEANUP_INTERVAL_MINUTES lookup in this test to locate where to change the
checks.
0e548b2 to
4c87657
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/tests/test_api_keys.py`:
- Around line 926-930: Replace the current lookup that falls back to
containers[0] with a strict existence check: find the maas-api container using
the generator expression used for maas_api_container and if not found raise an
explicit error (or assert) instead of defaulting; then build env_vars from that
maas_api_container's "env" field as before (the code referencing
maas_api_container and env_vars should remain, only remove the fallback and add
a fail-fast raise when the maas-api container is missing).
- Around line 1318-1327: The patch currently always invokes subprocess.run with
patch_ops even when patch_ops is empty; update the logic in the block that
builds patch_ops (variables: patch_ops, webhook_config, webhook_name and the
loop over webhook_config.get("webhooks", [])) to skip calling subprocess.run
unless patch_ops contains at least one operation (e.g., check if patch_ops
before calling subprocess.run) so you only apply the JSON patch when there are
matching webhooks to modify.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 79695628-6705-4fe3-8073-406ad1050b8b
📒 Files selected for processing (32)
.github/hack/cleanup-odh.sh.github/workflows/maas-controller-ci.ymldeployment/base/maas-api/core/cronjob-cleanup.yamldeployment/base/maas-api/core/deployment.yamldeployment/base/maas-api/core/kustomization.yamldeployment/base/maas-api/core/networkpolicy-cleanup.yamldeployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yamldeployment/base/maas-api/overlays/tls/kustomization.yamldeployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yamldeployment/base/maas-controller/default/params.envdeployment/base/maas-controller/manager/manager.yamldeployment/base/maas-controller/rbac/clusterrole.yamldeployment/overlays/odh/params.envdocs/content/configuration-and-management/api-key-administration.mddocs/content/reference/maas-api-overview.mddocs/content/user-guide/api-key-management.mdmaas-api/cmd/main.gomaas-api/internal/api_keys/handler.gomaas-api/internal/api_keys/service.gomaas-api/internal/config/config.gomaas-api/internal/config/config_test.gomaas-api/internal/constant/const.gomaas-controller/pkg/controller/maas/tenant_controller.gomaas-controller/pkg/platform/tenantreconcile/constants.gomaas-controller/pkg/platform/tenantreconcile/params.gomaas-controller/pkg/platform/tenantreconcile/params_test.gomaas-controller/pkg/platform/tenantreconcile/pipeline.gomaas-controller/pkg/platform/tenantreconcile/prune.gomaas-controller/pkg/platform/tenantreconcile/prune_test.goscripts/deploy.shtest/e2e/scripts/local-deploy.shtest/e2e/tests/test_api_keys.py
💤 Files with no reviewable changes (11)
- .github/workflows/maas-controller-ci.yml
- deployment/base/maas-api/core/kustomization.yaml
- deployment/base/maas-api/core/networkpolicy-cleanup.yaml
- deployment/base/maas-api/core/cronjob-cleanup.yaml
- deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
- deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
- deployment/base/maas-controller/default/params.env
- test/e2e/scripts/local-deploy.sh
- deployment/base/maas-api/overlays/tls/kustomization.yaml
- scripts/deploy.sh
- deployment/overlays/odh/params.env
✅ Files skipped from review due to trivial changes (6)
- maas-api/internal/api_keys/service.go
- maas-api/internal/constant/const.go
- docs/content/user-guide/api-key-management.md
- docs/content/reference/maas-api-overview.md
- maas-api/internal/api_keys/handler.go
- docs/content/configuration-and-management/api-key-administration.md
🚧 Files skipped from review as they are similar to previous changes (13)
- maas-controller/pkg/platform/tenantreconcile/pipeline.go
- .github/hack/cleanup-odh.sh
- maas-api/internal/config/config_test.go
- deployment/base/maas-controller/rbac/clusterrole.yaml
- maas-api/internal/config/config.go
- maas-controller/pkg/platform/tenantreconcile/prune_test.go
- maas-controller/pkg/controller/maas/tenant_controller.go
- deployment/base/maas-controller/manager/manager.yaml
- maas-controller/pkg/platform/tenantreconcile/prune.go
- maas-controller/pkg/platform/tenantreconcile/params_test.go
- maas-api/cmd/main.go
- maas-controller/pkg/platform/tenantreconcile/constants.go
- maas-controller/pkg/platform/tenantreconcile/params.go
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/tests/test_api_keys.py`:
- Around line 926-930: Replace the current lookup that falls back to
containers[0] with a strict existence check: find the maas-api container using
the generator expression used for maas_api_container and if not found raise an
explicit error (or assert) instead of defaulting; then build env_vars from that
maas_api_container's "env" field as before (the code referencing
maas_api_container and env_vars should remain, only remove the fallback and add
a fail-fast raise when the maas-api container is missing).
- Around line 1318-1327: The patch currently always invokes subprocess.run with
patch_ops even when patch_ops is empty; update the logic in the block that
builds patch_ops (variables: patch_ops, webhook_config, webhook_name and the
loop over webhook_config.get("webhooks", [])) to skip calling subprocess.run
unless patch_ops contains at least one operation (e.g., check if patch_ops
before calling subprocess.run) so you only apply the JSON patch when there are
matching webhooks to modify.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 79695628-6705-4fe3-8073-406ad1050b8b
📒 Files selected for processing (32)
.github/hack/cleanup-odh.sh.github/workflows/maas-controller-ci.ymldeployment/base/maas-api/core/cronjob-cleanup.yamldeployment/base/maas-api/core/deployment.yamldeployment/base/maas-api/core/kustomization.yamldeployment/base/maas-api/core/networkpolicy-cleanup.yamldeployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yamldeployment/base/maas-api/overlays/tls/kustomization.yamldeployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yamldeployment/base/maas-controller/default/params.envdeployment/base/maas-controller/manager/manager.yamldeployment/base/maas-controller/rbac/clusterrole.yamldeployment/overlays/odh/params.envdocs/content/configuration-and-management/api-key-administration.mddocs/content/reference/maas-api-overview.mddocs/content/user-guide/api-key-management.mdmaas-api/cmd/main.gomaas-api/internal/api_keys/handler.gomaas-api/internal/api_keys/service.gomaas-api/internal/config/config.gomaas-api/internal/config/config_test.gomaas-api/internal/constant/const.gomaas-controller/pkg/controller/maas/tenant_controller.gomaas-controller/pkg/platform/tenantreconcile/constants.gomaas-controller/pkg/platform/tenantreconcile/params.gomaas-controller/pkg/platform/tenantreconcile/params_test.gomaas-controller/pkg/platform/tenantreconcile/pipeline.gomaas-controller/pkg/platform/tenantreconcile/prune.gomaas-controller/pkg/platform/tenantreconcile/prune_test.goscripts/deploy.shtest/e2e/scripts/local-deploy.shtest/e2e/tests/test_api_keys.py
💤 Files with no reviewable changes (11)
- .github/workflows/maas-controller-ci.yml
- deployment/base/maas-api/core/kustomization.yaml
- deployment/base/maas-api/core/networkpolicy-cleanup.yaml
- deployment/base/maas-api/core/cronjob-cleanup.yaml
- deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
- deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
- deployment/base/maas-controller/default/params.env
- test/e2e/scripts/local-deploy.sh
- deployment/base/maas-api/overlays/tls/kustomization.yaml
- scripts/deploy.sh
- deployment/overlays/odh/params.env
✅ Files skipped from review due to trivial changes (6)
- maas-api/internal/api_keys/service.go
- maas-api/internal/constant/const.go
- docs/content/user-guide/api-key-management.md
- docs/content/reference/maas-api-overview.md
- maas-api/internal/api_keys/handler.go
- docs/content/configuration-and-management/api-key-administration.md
🚧 Files skipped from review as they are similar to previous changes (13)
- maas-controller/pkg/platform/tenantreconcile/pipeline.go
- .github/hack/cleanup-odh.sh
- maas-api/internal/config/config_test.go
- deployment/base/maas-controller/rbac/clusterrole.yaml
- maas-api/internal/config/config.go
- maas-controller/pkg/platform/tenantreconcile/prune_test.go
- maas-controller/pkg/controller/maas/tenant_controller.go
- deployment/base/maas-controller/manager/manager.yaml
- maas-controller/pkg/platform/tenantreconcile/prune.go
- maas-controller/pkg/platform/tenantreconcile/params_test.go
- maas-api/cmd/main.go
- maas-controller/pkg/platform/tenantreconcile/constants.go
- maas-controller/pkg/platform/tenantreconcile/params.go
🛑 Comments failed to post (2)
test/e2e/tests/test_api_keys.py (2)
926-930:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRequire explicit
maas-apicontainer match instead of falling back tocontainers[0].This fallback can validate env vars on the wrong container and produce false pass/fail outcomes if container ordering changes. Fail fast when
maas-apiis absent.Proposed fix
- maas_api_container = next( - (c for c in containers if c.get("name") == "maas-api"), - containers[0] if containers else {}, - ) + maas_api_container = next( + (c for c in containers if c.get("name") == "maas-api"), + None, + ) + assert maas_api_container is not None, "maas-api container not found in Deployment spec"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/tests/test_api_keys.py` around lines 926 - 930, Replace the current lookup that falls back to containers[0] with a strict existence check: find the maas-api container using the generator expression used for maas_api_container and if not found raise an explicit error (or assert) instead of defaulting; then build env_vars from that maas_api_container's "env" field as before (the code referencing maas_api_container and env_vars should remain, only remove the fallback and add a fail-fast raise when the maas-api container is missing).
1318-1327:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGuard against empty webhook patch operations before applying
Ignore.If no matching webhook names are found, the patch step can become a no-op and the test proceeds with
failurePolicyunchanged, causing brittle failures after controller scale-down.Proposed fix
patch_ops = [] for idx, webhook in enumerate(webhook_config.get("webhooks", [])): if webhook.get("name") in ["vmaassubscription.kb.io", "vmaasauthpolicy.kb.io"]: patch_ops.append({"op": "replace", "path": f"/webhooks/{idx}/failurePolicy", "value": "Ignore"}) + if not patch_ops: + pytest.skip( + f"No matching webhooks found in {webhook_name}; cannot set failurePolicy=Ignore" + ) + subprocess.run( ["oc", "patch", "validatingwebhookconfiguration", webhook_name, "--type=json", "-p", json.dumps(patch_ops)], capture_output=True, text=True, check=True )🧰 Tools
🪛 Ruff (0.15.15)
[error] 1323-1323:
subprocesscall: check for execution of untrusted input(S603)
[error] 1324-1325: Starting a process with a partial executable path
(S607)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/tests/test_api_keys.py` around lines 1318 - 1327, The patch currently always invokes subprocess.run with patch_ops even when patch_ops is empty; update the logic in the block that builds patch_ops (variables: patch_ops, webhook_config, webhook_name and the loop over webhook_config.get("webhooks", [])) to skip calling subprocess.run unless patch_ops contains at least one operation (e.g., check if patch_ops before calling subprocess.run) so you only apply the JSON patch when there are matching webhooks to modify.
4c87657 to
d868fc2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@maas-controller/pkg/platform/tenantreconcile/params.go`:
- Around line 140-148: patchPreProcessingDeployment is missing observability and
the common signature pattern: add a logr.Logger parameter to
patchPreProcessingDeployment(r *unstructured.Unstructured, params
PlatformParams, log logr.Logger), mirror other patch functions, and emit a V(4)
log when you call setContainerImage for PayloadPreProcessingName (include the
image value and context); update all call sites that invoke
patchPreProcessingDeployment to pass the logger and keep error wrapping as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b8a2beb6-f5f0-4e76-ba66-fe5b875344c5
📒 Files selected for processing (7)
docs/content/configuration-and-management/api-key-administration.mddocs/content/reference/maas-api-overview.mdmaas-api/internal/api_keys/handler.gomaas-api/internal/api_keys/service.gomaas-controller/pkg/platform/tenantreconcile/params.gomaas-controller/pkg/platform/tenantreconcile/prune.gotest/e2e/tests/test_api_keys.py
✅ Files skipped from review due to trivial changes (4)
- docs/content/reference/maas-api-overview.md
- maas-api/internal/api_keys/service.go
- docs/content/configuration-and-management/api-key-administration.md
- maas-api/internal/api_keys/handler.go
🚧 Files skipped from review as they are similar to previous changes (2)
- maas-controller/pkg/platform/tenantreconcile/prune.go
- test/e2e/tests/test_api_keys.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@maas-controller/pkg/platform/tenantreconcile/params.go`:
- Around line 140-148: patchPreProcessingDeployment is missing observability and
the common signature pattern: add a logr.Logger parameter to
patchPreProcessingDeployment(r *unstructured.Unstructured, params
PlatformParams, log logr.Logger), mirror other patch functions, and emit a V(4)
log when you call setContainerImage for PayloadPreProcessingName (include the
image value and context); update all call sites that invoke
patchPreProcessingDeployment to pass the logger and keep error wrapping as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b8a2beb6-f5f0-4e76-ba66-fe5b875344c5
📒 Files selected for processing (7)
docs/content/configuration-and-management/api-key-administration.mddocs/content/reference/maas-api-overview.mdmaas-api/internal/api_keys/handler.gomaas-api/internal/api_keys/service.gomaas-controller/pkg/platform/tenantreconcile/params.gomaas-controller/pkg/platform/tenantreconcile/prune.gotest/e2e/tests/test_api_keys.py
✅ Files skipped from review due to trivial changes (4)
- docs/content/reference/maas-api-overview.md
- maas-api/internal/api_keys/service.go
- docs/content/configuration-and-management/api-key-administration.md
- maas-api/internal/api_keys/handler.go
🚧 Files skipped from review as they are similar to previous changes (2)
- maas-controller/pkg/platform/tenantreconcile/prune.go
- test/e2e/tests/test_api_keys.py
🛑 Comments failed to post (1)
maas-controller/pkg/platform/tenantreconcile/params.go (1)
140-148:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing observability and inconsistent pattern.
All other deployment-patching functions (
patchMaaSAPIDeployment,patchPayloadProcessingDeployment) include alog logr.Loggerparameter and emit V(4)-level logs for image patching. This function omits both, creating an observability gap for debugging image propagation.📝 Proposed fix to add logging
-func patchPreProcessingDeployment(r *unstructured.Unstructured, params PlatformParams) error { +func patchPreProcessingDeployment(log logr.Logger, r *unstructured.Unstructured, params PlatformParams) error { r.SetNamespace(params.GatewayNamespace) if params.PayloadProcessingImage != "" { + log.V(4).Info("Patching payload-pre-processing image", "image", params.PayloadProcessingImage) if err := setContainerImage(r, PayloadPreProcessingName, params.PayloadProcessingImage); err != nil { return fmt.Errorf("patch payload-pre-processing image: %w", err) } } return nil }Update the call site at line 96:
case gvk == GVKDeployment && name == PayloadPreProcessingName: - if err := patchPreProcessingDeployment(r, params); err != nil { + if err := patchPreProcessingDeployment(log, r, params); err != nil { return err }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maas-controller/pkg/platform/tenantreconcile/params.go` around lines 140 - 148, patchPreProcessingDeployment is missing observability and the common signature pattern: add a logr.Logger parameter to patchPreProcessingDeployment(r *unstructured.Unstructured, params PlatformParams, log logr.Logger), mirror other patch functions, and emit a V(4) log when you call setContainerImage for PayloadPreProcessingName (include the image value and context); update all call sites that invoke patchPreProcessingDeployment to pass the logger and keep error wrapping as-is.
- save resource, no spin up job every 15mins - less image to pull down for disconnect - should cleanup endpoint later if needed - less security concern as one less networkpolicy and patch wors Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- for race condition case, use sync.WaitGroup to let goroutine finish Signed-off-by: Wen Zhou <wenzhou@redhat.com>
This reverts commit 08e67f1.
- either pass as flag --cleanup-internval to main or env variable CLEANUP_INTERNVAL_MINUTES in deployment for customized value - if not set any or set to '0', use default 15mins - if set to a negative value, skip cleanup goroutine totally Signed-off-by: Wen Zhou <wenzhou@redhat.com>
…onJob Remove stale CronJob patching, deploy params, and RBAC from opendatahub-io#934 fallout. Add PruneLegacyCleanupResources on tenant reconcile, run maas-controller CI on every PR, and update docs/E2E for CLEANUP_INTERVAL_MINUTES.
- fix rebase as well Signed-off-by: Wen Zhou <wenzhou@redhat.com>
d868fc2 to
e8a1e51
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
maas-controller/pkg/controller/maas/tenant_controller.go (1)
82-93: Remove duplicate CronJob RBAC annotation for code clarity.Line 92 duplicates the CronJob annotation from Line 82 with broader permissions (
get;list;watch;create;patch;deletevs.get;delete). The rendered ClusterRole correctly reflects only the narrowerget;deletepermissions, but the duplicate annotation introduces code maintenance risk and potential for future mistakes. Remove Line 92 to maintain a single, minimal RBAC declaration.Remediation
// +kubebuilder:rbac:groups=telemetry.istio.io,resources=telemetries,verbs=get;list;watch;create;patch;delete -// +kubebuilder:rbac:groups=batch,resources=cronjobs,verbs=get;list;watch;create;patch;delete // +kubebuilder:rbac:groups=monitoring.coreos.com,resources=podmonitors;servicemonitors,verbs=get;list;watch;create;patch;delete🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maas-controller/pkg/controller/maas/tenant_controller.go` around lines 82 - 93, Remove the duplicate kubebuilder RBAC annotation for batch/cronjobs resources that grants broader permissions (get;list;watch;create;patch;delete). Keep only the first, minimal declaration that specifies the narrower get;delete verbs for cronjobs. The duplicate annotation creates maintenance risk and unnecessarily duplicates the resource rule, so delete the second cronjobs RBAC annotation that contains the expanded permission set.Source: Coding guidelines
test/e2e/tests/test_api_keys.py (1)
918-922: ⚡ Quick winAdd a timeout to the
oc get deploysubprocess call.Line 918 can block indefinitely on cluster/API stalls, which makes CI jobs hang unnecessarily.
Proposed change
result = sp.run( ["oc", "get", "deploy", "maas-api", "-n", deployment_namespace, "-o", "json"], capture_output=True, text=True, + timeout=30, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/tests/test_api_keys.py` around lines 918 - 922, The sp.run() call executing the "oc get deploy maas-api" command lacks a timeout parameter, which allows it to block indefinitely if the cluster or API experiences stalls, causing CI jobs to hang. Add a timeout parameter to the sp.run() call to prevent indefinite blocking and ensure the subprocess terminates after a reasonable time period if the command does not complete.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deployment/overlays/odh/params.env`:
- Line 3: The payload-processing-image reference in params.env uses a tag
reference format (with `:`) which is mutable and vulnerable to tag-retargeting
attacks. Replace the tag-based image reference with a digest-based reference
using the `@sha256:` format instead of the `:` tag notation. You will need to
obtain the actual SHA256 digest of the image from the container registry and
update the reference to use the immutable digest format, for example:
quay.io/opendatahub/odh-ai-gateway-payload-processing@sha256:[actual-digest-here].
This ensures the deployment always pulls the exact image by its content hash
rather than relying on a mutable tag.
In `@test/e2e/tests/test_api_keys.py`:
- Around line 932-935: The container lookup logic in the next() call is falling
back to containers[0] when the maas-api container is not found, which silently
masks configuration regressions. Remove the fallback logic and instead raise an
explicit error (or fail the test) if the maas-api container is not found in the
containers list. This ensures that missing or misconfigured containers will be
caught rather than silently using a different container for validation.
---
Nitpick comments:
In `@maas-controller/pkg/controller/maas/tenant_controller.go`:
- Around line 82-93: Remove the duplicate kubebuilder RBAC annotation for
batch/cronjobs resources that grants broader permissions
(get;list;watch;create;patch;delete). Keep only the first, minimal declaration
that specifies the narrower get;delete verbs for cronjobs. The duplicate
annotation creates maintenance risk and unnecessarily duplicates the resource
rule, so delete the second cronjobs RBAC annotation that contains the expanded
permission set.
In `@test/e2e/tests/test_api_keys.py`:
- Around line 918-922: The sp.run() call executing the "oc get deploy maas-api"
command lacks a timeout parameter, which allows it to block indefinitely if the
cluster or API experiences stalls, causing CI jobs to hang. Add a timeout
parameter to the sp.run() call to prevent indefinite blocking and ensure the
subprocess terminates after a reasonable time period if the command does not
complete.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 01651a18-c662-4e20-9753-91ea2cd5aa4a
📒 Files selected for processing (33)
.github/hack/cleanup-odh.sh.github/workflows/maas-controller-ci.ymldeployment/base/maas-api/core/cronjob-cleanup.yamldeployment/base/maas-api/core/deployment.yamldeployment/base/maas-api/core/kustomization.yamldeployment/base/maas-api/core/networkpolicy-cleanup.yamldeployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yamldeployment/base/maas-api/overlays/tls/kustomization.yamldeployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yamldeployment/base/maas-controller/default/params.envdeployment/base/maas-controller/manager/manager.yamldeployment/base/maas-controller/rbac/clusterrole.yamldeployment/overlays/odh/params.envdocs/content/configuration-and-management/api-key-administration.mddocs/content/reference/maas-api-overview.mddocs/content/user-guide/api-key-management.mdmaas-api/cmd/main.gomaas-api/internal/api_keys/handler.gomaas-api/internal/api_keys/service.gomaas-api/internal/config/config.gomaas-api/internal/config/config_test.gomaas-api/internal/constant/const.gomaas-controller/pkg/controller/maas/tenant_controller.gomaas-controller/pkg/platform/tenantreconcile/constants.gomaas-controller/pkg/platform/tenantreconcile/naming_test.gomaas-controller/pkg/platform/tenantreconcile/params.gomaas-controller/pkg/platform/tenantreconcile/params_test.gomaas-controller/pkg/platform/tenantreconcile/pipeline.gomaas-controller/pkg/platform/tenantreconcile/prune.gomaas-controller/pkg/platform/tenantreconcile/prune_test.goscripts/deploy.shtest/e2e/scripts/local-deploy.shtest/e2e/tests/test_api_keys.py
💤 Files with no reviewable changes (6)
- .github/workflows/maas-controller-ci.yml
- deployment/base/maas-api/overlays/tls/networkpolicy-cleanup-patch.yaml
- deployment/base/maas-api/core/kustomization.yaml
- deployment/base/maas-api/core/networkpolicy-cleanup.yaml
- deployment/base/maas-api/overlays/tls/cronjob-cleanup-patch.yaml
- deployment/base/maas-api/core/cronjob-cleanup.yaml
✅ Files skipped from review due to trivial changes (5)
- maas-api/internal/api_keys/service.go
- docs/content/user-guide/api-key-management.md
- docs/content/reference/maas-api-overview.md
- maas-api/internal/api_keys/handler.go
- docs/content/configuration-and-management/api-key-administration.md
🚧 Files skipped from review as they are similar to previous changes (17)
- deployment/base/maas-api/core/deployment.yaml
- deployment/base/maas-controller/rbac/clusterrole.yaml
- scripts/deploy.sh
- deployment/base/maas-controller/manager/manager.yaml
- maas-controller/pkg/platform/tenantreconcile/params.go
- maas-api/internal/config/config_test.go
- maas-api/internal/constant/const.go
- maas-controller/pkg/platform/tenantreconcile/constants.go
- test/e2e/scripts/local-deploy.sh
- maas-controller/pkg/platform/tenantreconcile/params_test.go
- maas-controller/pkg/platform/tenantreconcile/pipeline.go
- deployment/base/maas-api/overlays/tls/kustomization.yaml
- .github/hack/cleanup-odh.sh
- maas-controller/pkg/platform/tenantreconcile/prune.go
- maas-api/cmd/main.go
- maas-api/internal/config/config.go
- deployment/base/maas-controller/default/params.env
| @@ -1,7 +1,6 @@ | |||
| maas-api-image=quay.io/opendatahub/maas-api:odh-stable | |||
| maas-controller-image=quay.io/opendatahub/maas-controller:odh-stable | |||
| payload-processing-image=quay.io/opendatahub/odh-ai-gateway-payload-processing:36614760abfa1b3fb2b521a89097bdaf6e0693b5 | |||
There was a problem hiding this comment.
Pin container image by digest to prevent tag-retarget supply-chain attacks (CWE-494).
Line 3 uses a tag reference, which is mutable even when it looks like a commit SHA. Exploit scenario: a compromised registry workflow can retarget the tag to a malicious image without changing this repo.
Remediation diff
-payload-processing-image=quay.io/opendatahub/odh-ai-gateway-payload-processing:36614760abfa1b3fb2b521a89097bdaf6e0693b5
+payload-processing-image=quay.io/opendatahub/odh-ai-gateway-payload-processing@sha256:<immutable_digest>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deployment/overlays/odh/params.env` at line 3, The payload-processing-image
reference in params.env uses a tag reference format (with `:`) which is mutable
and vulnerable to tag-retargeting attacks. Replace the tag-based image reference
with a digest-based reference using the `@sha256:` format instead of the `:` tag
notation. You will need to obtain the actual SHA256 digest of the image from the
container registry and update the reference to use the immutable digest format,
for example:
quay.io/opendatahub/odh-ai-gateway-payload-processing@sha256:[actual-digest-here].
This ensures the deployment always pulls the exact image by its content hash
rather than relying on a mutable tag.
| maas_api_container = next( | ||
| (c for c in containers if c.get("name") == "maas-api"), | ||
| containers[0] if containers else {}, | ||
| ) |
There was a problem hiding this comment.
Fail when maas-api container is missing instead of falling back to containers[0].
Line 934 silently switches to the first container, which can validate the wrong env block and mask a container-name/config regression.
Proposed change
- maas_api_container = next(
- (c for c in containers if c.get("name") == "maas-api"),
- containers[0] if containers else {},
- )
+ maas_api_container = next(
+ (c for c in containers if c.get("name") == "maas-api"),
+ None,
+ )
+ assert maas_api_container is not None, \
+ "Deployment spec must contain a container named 'maas-api'"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/tests/test_api_keys.py` around lines 932 - 935, The container lookup
logic in the next() call is falling back to containers[0] when the maas-api
container is not found, which silently masks configuration regressions. Remove
the fallback logic and instead raise an explicit error (or fail the test) if the
maas-api container is not found in the containers list. This ensures that
missing or misconfigured containers will be caught rather than silently using a
different container for validation.
|
@zdtsw: The following test has Failed: OCI Artifact Browser URLInspecting Test Artifacts ManuallyTo inspect your test artifacts manually, follow these steps:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:maas-group-test-dt7qg |
Description
Completes the migration from the external
maas-api-key-cleanupCronJob to the in-process cleanup loop in maas-api (follow-up to #934).in detail :
PruneLegacyCleanupResourcesso the orphaned CronJob +maas-api-cleanup-restrictNetworkPolicy are deleted on Tenant reconcile
Co-authored-by @jland-redhat
How Has This Been Tested?
run
Merge criteria:
Summary by CodeRabbit
CLEANUP_INTERVAL_MINUTES(default: 15).