Skip to content

feat: add MaaS (Models as a Service) Helm chart and integration#61

Closed
tsisodia10 wants to merge 3 commits intoopendatahub-io:mainfrom
tsisodia10:maas-integration-clean
Closed

feat: add MaaS (Models as a Service) Helm chart and integration#61
tsisodia10 wants to merge 3 commits intoopendatahub-io:mainfrom
tsisodia10:maas-integration-clean

Conversation

@tsisodia10
Copy link
Copy Markdown
Contributor

@tsisodia10 tsisodia10 commented Apr 6, 2026

Summary

Add MaaS (Models as a Service) Helm chart and integrate it into the rhaii-on-xks deployment stack alongside RHCL for authenticated, rate-limited model serving on managed Kubernetes (xKS).

What's included

  • charts/maas/ — Full Helm chart for the MaaS stack:
    • maas-controller — Kubernetes operator that watches MaaSSubscription and LLMInferenceService CRs
    • maas-api — REST API server for API key management, subscriptions, and model discovery
    • PostgreSQL — Persistent storage for API keys and subscriptions
    • Gateway + HTTPRoute — Istio Gateway API routing for model inference endpoints
    • AuthPolicy + RateLimitPolicy — Kuadrant policies for API key auth and rate limiting
    • 4 CRDs: MaaSSubscription, ExternalModel, MaaSModelRef, MaaSAuthPolicy
  • Makefiledeploy-maas / undeploy-maas targets with prerequisite validation (KServe, RHCL, Gateway API), orphaned cert-manager detection, MaaS status output
  • helmfile.yaml.gotmpl — MaaS helmfile inclusion with automatic gatewayNamespaces merging between RHCL and MaaS
  • values.yamlmaas config block (enabled, namespace, gatewayNamespace) + gatewayNamespaces for RHCL
  • scripts/cleanup.sh — RHCL/Kuadrant and MaaS resource cleanup (CRs, CRDs, RBAC, namespaces)

Deployment

# Deploy full stack including MaaS
make deploy-all RHCL=true MAAS=true

# Deploy MaaS standalone (requires KServe + RHCL already deployed)
make deploy-maas

# Remove MaaS
make undeploy-maas

Validated on

  • AKS (Azure Kubernetes Service) with GPU nodes
  • E2E tested: API key authentication, rate limiting, model inference through Gateway

Test plan

  • make deploy-all RHCL=true MAAS=true succeeds on a clean cluster
  • make deploy-maas validates prerequisites (fails gracefully if KServe/RHCL missing)
  • MaaS API health endpoint responds (/maas-api/health)
  • API key authentication enforced at Gateway (401 without key, 200 with valid key)
  • Rate limiting enforced (429 after exceeding limits)
  • make undeploy-maas cleans up all MaaS resources
  • make status shows MaaS component status

Summary by CodeRabbit

  • New Features

    • Optional Models-as-a-Service (MaaS) deployment with gateway routing, auth & rate‑limit policies, subscription CRDs, API/controller, and optional embedded PostgreSQL; Azure AD and per-user rate limiting supported; make targets to deploy/undeploy MaaS and status now reports MaaS health.
  • Bug Fixes / Cleanup

    • Improved undeploy order and best-effort cleanup for cert-manager, MaaS and Kuadrant resources.
  • Chores

    • Added lint target (helm/yaml/shell) and updated Make help; new values to enable/configure MaaS and gateway namespaces.

Add MaaS Helm chart and integrate it into the rhaii-on-xks deployment
stack alongside RHCL for authenticated, rate-limited model serving on
managed Kubernetes (xKS).

- charts/maas/: Helm chart with maas-controller, maas-api, PostgreSQL,
  Gateway, HTTPRoute, AuthPolicy and RateLimitPolicy templates
- CRDs: MaaSSubscription, ExternalModel, MaaSModelRef, MaaSAuthPolicy
- Makefile: deploy-maas/undeploy-maas targets, prerequisite checks,
  orphaned cert-manager detection, MaaS status output
- helmfile.yaml.gotmpl: MaaS helmfile inclusion, auto-merge of
  gatewayNamespaces between RHCL and MaaS
- values.yaml: maas config block (enabled, namespace, gatewayNamespace)
- scripts/cleanup.sh: RHCL/Kuadrant and MaaS resource cleanup

Depends on: KServe, RHCL (Kuadrant), Istio, cert-manager

Made-with: Cursor
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tsisodia10
Once this PR has been reviewed and has the lgtm label, please assign anishasthana for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Adds a new Models-as-a-Service (MaaS) Helm chart with CRDs (ExternalModel, MaaSAuthPolicy, MaaSModelRef, MaaSSubscription), controller and API deployments, Gateway and HTTPRoute templates, Kuadrant auth/rate-limit policies, optional PostgreSQL, Azure AD config, RBAC, NetworkPolicy, and supporting templates/helpers. Integrates MaaS into top-level helmfile and Makefile (MAAS flag, deploy-maas/undeploy-maas, status checks, lint target), updates cert-manager cleanup logic, extends scripts/cleanup.sh to remove Kuadrant/RHCL and MaaS artifacts, and adds values.yaml entries for gateway namespace aggregation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Security & Configuration Issues

  • Hard-coded DB credentials in chart secrets (POSTGRESQL_USER/POSTGRESQL_PASSWORD = maas / maas123). Action: remove hardcoded credentials; require operator-provided secret or use External/Sealed Secrets. (CWE-798)
  • Cluster-scoped secret access: maas-api ClusterRole grants broad create/get/list/patch/update/watch on core Secrets cluster-wide. Action: restrict secret verbs to the chart namespace or use least-privilege Roles. (CWE-264)
  • Over-broad controller ClusterRole grants Kuadrant/MAAS CRD permissions cluster-wide (including finalizers/status). Action: scope to specific namespaces or split into namespace-scoped Roles + RoleBindings. (CWE-264)
  • CRD deletion in cleanup script and undeploy-maas deletes CRDs without ensuring instances/finalizers are cleared; CRD deletion can hang or orphan resources. Action: enumerate and delete instances, clear finalizers, verify zero instances before CRD removal; fail loudly if not cleanable. (CWE-770)
  • Pull-secret file fallback reads $HOME/.config/containers/auth.json and injects into service account via hooked patching with retries; lacks ownership/permission checks and may silently proceed without a valid secret. Action: validate file presence/permissions or require explicit Secret reference; make injection failures explicit. (CWE-732)
  • Helmfile presync/postsync hooks perform cluster mutations (CRD apply, SA patch, pod deletions) via shell hooks that may continue on partial failures. Action: ensure hooks exit non-zero on critical failures and surface errors to the operator; convert complex scripts to robust small utilities or kustomize/helm hooks. (CWE-377)
  • NetworkPolicy permits ingress from kuadrant namespaces to maas-api broadly by namespace and label; source selector could be narrower. Action: tighten podSelector/namespaceSelector and port restrictions to limit exposure. (CWE-942)
  • ServiceAccount imagePullSecret patching targets gateway service account in gateway namespace and retries; if the ServiceAccount is in use, deleting gateway pods may cause transient outages. Action: document downtime, or use safer rollout strategies (rolling restart) and make pull-secret injection an explicit, auditable step.
  • cert-manager cleanup heuristics delete deployments and service accounts when release missing; this could remove user-managed resources. Action: require operator confirmation or one-shot force flag, and log deletions with provenance. (CWE-693)
  • Some CRD OpenAPI schemas and CRD-level validations rely on operator/controller behavior rather than schema (e.g., cross-resource refs, required owner fields). Action: add CEL/minItems/minProperties validations where possible to fail fast on malformed resources.
  • Makefile undeploy/undeploy-maas use best-effort (|| true) in places, potentially hiding errors. Action: provide a strict mode that fails on errors and surface remediation steps when best-effort paths are used.

Only actionable issues listed above.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add MaaS (Models as a Service) Helm chart and integration' directly describes the primary change: introducing a new MaaS Helm chart and integrating it into the deployment stack. It is specific, clear, and accurately reflects the core contribution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
scripts/cleanup.sh (2)

63-68: ⚠️ Potential issue | 🟡 Minor

Unquoted variables in kubectl patch commands - potential word splitting.

Variables $rev and $ist should be quoted to prevent word splitting if resource names contain spaces or special characters.

Proposed fix
 kubectl get istiorevision -A -o name 2>/dev/null | while read rev; do
-    kubectl patch $rev -p '{"metadata":{"finalizers":[]}}' --type=merge 2>/dev/null || true
+    kubectl patch "$rev" -p '{"metadata":{"finalizers":[]}}' --type=merge 2>/dev/null || true
 done
 kubectl get istio -A -o name 2>/dev/null | while read ist; do
-    kubectl patch $ist -p '{"metadata":{"finalizers":[]}}' --type=merge 2>/dev/null || true
+    kubectl patch "$ist" -p '{"metadata":{"finalizers":[]}}' --type=merge 2>/dev/null || true
 done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/cleanup.sh` around lines 63 - 68, The kubectl patch loops use
unquoted shell variables ($rev and $ist) which can cause word splitting for
resource names with spaces/special chars; update the two kubectl patch
invocations that reference $rev and $ist to quote the variables (use "$rev" and
"$ist") so each resource name is passed as a single argument to kubectl, keeping
the surrounding command structure (the while read loops and the 2>/dev/null ||
true behavior) intact.

1-4: ⚠️ Potential issue | 🟠 Major

Missing set -euo pipefail - CWE-78 shell script safety violation.

Per coding guidelines, shell scripts must use set -euo pipefail at script start for proper error handling and to prevent undefined variable usage.

Proposed fix
 #!/bin/bash
+set -euo pipefail
 # Cleanup script for rhaii-on-xks
 # Runs helmfile destroy and cleans up presync/template resources
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/cleanup.sh` around lines 1 - 4, Add strict error handling to the
script by inserting "set -euo pipefail" immediately after the existing shebang
(#!/bin/bash) in cleanup.sh; this ensures the script exits on errors, treats
unset variables as errors, and fails on pipeline errors—after adding it, scan
the script for any unquoted variable expansions or intentional unset-variable
usages and fix them (declare default values or quote expansions) so the new
strict mode doesn't break legitimate behavior.
🟡 Minor comments (6)
charts/maas/templates/namespace.yaml-6-6 (1)

6-6: ⚠️ Potential issue | 🟡 Minor

Validate maasApi.subscriptionNamespace at template render time.

An empty/missing value here yields invalid namespace manifests and brittle installs. Enforce it with required for deterministic failures.

Remediation
-  name: {{ .Values.maasApi.subscriptionNamespace }}
+  name: {{ required "maasApi.subscriptionNamespace is required" .Values.maasApi.subscriptionNamespace }}

As per coding guidelines, "REVIEW PRIORITIES: ... 3. Bug-prone patterns and error handling gaps".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/templates/namespace.yaml` at line 6, The template uses {{
.Values.maasApi.subscriptionNamespace }} directly which allows empty/missing
values and produces invalid Namespace manifests; update the Helm template in
namespace.yaml to validate and fail fast by replacing the bare reference with
the required function (e.g. required "error message"
.Values.maasApi.subscriptionNamespace) so template rendering errors clearly when
maasApi.subscriptionNamespace is not provided or empty; ensure the error message
identifies maasApi.subscriptionNamespace so operators know which value to set.
charts/maas/templates/pull-secret.yaml-15-16 (1)

15-16: ⚠️ Potential issue | 🟡 Minor

Potential double base64 encoding of dockerConfigJson.

The template applies b64enc to .Values.pullSecret.dockerConfigJson. If the value is already base64-encoded (common when sourced from existing pull secrets or ~/.docker/config.json), this results in double encoding and a non-functional secret.

Kubernetes expects .dockerconfigjson to be base64-encoded JSON. If your input is raw JSON, this is correct. If your input is already base64 (from kubectl get secret ... -o jsonpath), this will break.

Proposed fix: Add conditional or document expected format

Option 1: Document clearly in values.yaml that raw JSON is expected:

pullSecret:
  # dockerConfigJson must be RAW JSON (not base64), e.g.:
  # {"auths":{"registry.redhat.io":{"auth":"..."}}}
  dockerConfigJson: ""

Option 2: Use a helper that detects and handles both formats (more complex).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/templates/pull-secret.yaml` around lines 15 - 16, The template
currently always applies b64enc to .Values.pullSecret.dockerConfigJson which
will double-encode inputs that are already base64; update the chart so the
template and values agree: either (A) document in values.yaml that
pullSecret.dockerConfigJson must be RAW JSON (and keep the b64enc in the
template), or (B) implement a conditional in the pull-secret.yaml template that
detects if .Values.pullSecret.dockerConfigJson is already base64 (e.g. via a
simple regex helper) and only calls b64enc when it is not; reference
.dockerconfigjson in the template and .Values.pullSecret.dockerConfigJson and
the b64enc invocation when making the change.
charts/maas/templates/postgresql/secret.yaml-25-26 (1)

25-26: ⚠️ Potential issue | 🟡 Minor

Connection URL duplicates hardcoded credentials.

The DB_CONNECTION_URL embeds the same hardcoded credentials. If the credentials are parameterized above, this template must also reference the same values to stay in sync.

Proposed fix
 stringData:
-  DB_CONNECTION_URL: "postgresql://maas:maas-dev-password@postgres.{{ include "maas.namespace" . }}.svc.cluster.local:5432/{{ .Values.postgresql.database }}?sslmode=disable"
+  DB_CONNECTION_URL: "postgresql://{{ .Values.postgresql.auth.username | default "maas" }}:{{ .Values.postgresql.auth.password | default (randAlphaNum 24) }}@postgres.{{ include "maas.namespace" . }}.svc.cluster.local:5432/{{ .Values.postgresql.database }}?sslmode=disable"

Note: Using randAlphaNum twice would generate different values. Consider using a shared helper or lookup to retrieve the generated password from the first secret.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/templates/postgresql/secret.yaml` around lines 25 - 26, The
DB_CONNECTION_URL in secret.yaml currently hardcodes credentials instead of
using the parameterized values, causing duplication and drift; update
DB_CONNECTION_URL to reference the same username/password source used elsewhere
(e.g., the postgresql secret keys or .Values.postgresql.username/.password)
rather than embedding "maas" and "maas-dev-password", or retrieve the generated
password via a Helm lookup to the first secret; alternatively extract the
password generation into a shared helper so both the secret data and
DB_CONNECTION_URL use the same generated value (avoid calling randAlphaNum
twice).
charts/maas/templates/postgresql/deployment.yaml-55-68 (1)

55-68: ⚠️ Potential issue | 🟡 Minor

Hardcoded username in health probes.

Probes use -U maas but POSTGRESQL_USER is sourced from secret. If the secret's user value changes, probes will fail.

         readinessProbe:
           exec:
             command:
             - /bin/sh
             - -c
-            - pg_isready -U maas -d {{ .Values.postgresql.database }}
+            - pg_isready -U $POSTGRESQL_USER -d $POSTGRESQL_DATABASE

Same for livenessProbe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/templates/postgresql/deployment.yaml` around lines 55 - 68,
Health probes hardcode "-U maas" which will break if POSTGRESQL_USER from the
secret differs; update the readinessProbe and livenessProbe exec command lines
(readinessProbe, livenessProbe) to use the runtime environment variable instead
of a literal user by substituting the container env var (e.g. use
${POSTGRESQL_USER} or sh -c "pg_isready -U ${POSTGRESQL_USER} -d
${POSTGRESQL_DATABASE}") so the probes read the secret-provided
POSTGRESQL_USER/POSTGRESQL_DATABASE values at runtime.
Makefile-198-206 (1)

198-206: ⚠️ Potential issue | 🟡 Minor

Hardcoded gateway name maas-default-gateway doesn't respect customization.

Line 204 hardcodes the gateway name, but charts/maas/values.yaml:49 shows gateway.name is configurable. If users customize this value, the status check will fail to find the gateway.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 198 - 206, The Makefile health/status target hardcodes
the gateway name "maas-default-gateway" which breaks when users set
charts/maas/values.yaml gateway.name to a custom value; update the kubectl check
in the target that calls kubectl get gateway maas-default-gateway -n
istio-system to instead read the configured gateway name (e.g., via a variable
like MAAS_GATEWAY or by parsing charts/maas/values.yaml or Helm release values)
and use that variable in the kubectl command so the check honors the configured
gateway.name.
charts/maas/helmfile.yaml.gotmpl-93-101 (1)

93-101: ⚠️ Potential issue | 🟡 Minor

SA patch retry loop doesn't fail when SA never appears.

If the service account isn't created within 25 seconds (5 retries × 5s), the loop exits silently without patching. The gateway pod restart on line 103 will proceed, but the new pod may fail to pull images from registry.redhat.io. This creates a delayed runtime failure with no clear indication of the root cause.

Proposed fix - fail after retries exhausted
             for i in 1 2 3 4 5; do
               if kubectl get sa "$GATEWAY_SA" -n "$GATEWAY_NS" >/dev/null 2>&1; then
                 kubectl patch sa "$GATEWAY_SA" -n "$GATEWAY_NS" \
                   -p '{"imagePullSecrets": [{"name": "redhat-pull-secret"}]}' 2>/dev/null && \
                   echo "  Gateway SA patched" && break
               fi
               echo "  Waiting for gateway SA... ($i/5)"
               sleep 5
+              if [ "$i" -eq 5 ]; then
+                echo "WARNING: Gateway SA '$GATEWAY_SA' not found after 25s. Pull secret not configured."
+              fi
             done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/helmfile.yaml.gotmpl` around lines 93 - 101, The retry loop that
checks for the service account (variables GATEWAY_SA and GATEWAY_NS) and runs
kubectl patch may exit silently after 5 retries without applying the
imagePullSecrets, so change the logic in that loop to detect failure: after the
for-loop, check whether the patch succeeded (e.g., test kubectl get sa
"$GATEWAY_SA" -n "$GATEWAY_NS" and verify imagePullSecrets) and if not, emit a
clear error via echo/process exit with non-zero status so the deployment aborts
before the subsequent gateway pod restart; update the block around the kubectl
patch invocation to set a success flag or use the patch command's exit code and
fail fast when exhausted.
🧹 Nitpick comments (9)
charts/maas/templates/policies/azure-ad-admin-only.yaml (1)

1-1: Missing validation for required Azure AD values.

When azureAD.enabled is true, the template should fail early if tenantId or clientId are empty, rather than producing an invalid issuerUrl or audience. Consider using Helm's required function.

Proposed fix
-{{- if and .Values.azureAD.enabled .Values.azureAD.adminGroup }}
+{{- if and .Values.azureAD.enabled .Values.azureAD.adminGroup }}
+{{- $tenantId := required "azureAD.tenantId is required when azureAD.enabled is true" .Values.azureAD.tenantId }}
+{{- $clientId := required "azureAD.clientId is required when azureAD.enabled is true" .Values.azureAD.clientId }}
 apiVersion: kuadrant.io/v1
 kind: AuthPolicy

Then use $tenantId and $clientId in the template.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/templates/policies/azure-ad-admin-only.yaml` at line 1, Add
validation so the template fails early when .Values.azureAD.enabled is true and
required fields are missing: use Helm's required function to assert
.Values.azureAD.tenantId and .Values.azureAD.clientId (e.g., assign to local
variables like $tenantId and $clientId via required) before constructing
issuerUrl or audience, and then use those $tenantId/$clientId variables in the
issuerUrl and audience logic to prevent emitting invalid values when tenantId or
clientId are empty.
charts/maas/templates/policies/default-deny.yaml (1)

17-22: Path prefix matching may be overly permissive.

The startsWith("/v1/models") predicate will match any path beginning with /v1/models, including unintended paths like /v1/models-malicious or /v1/modelsABC. If the intent is to protect only the /v1/models endpoint and its sub-paths, consider whether stricter matching is needed.

Additionally, using limit: 0 as a deny mechanism works but relies on Kuadrant's rate limiting enforcement. Ensure the corresponding AuthPolicy in default-auth.yaml provides authentication for allowed paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/templates/policies/default-deny.yaml` around lines 17 - 22, The
deny-all-by-default predicate uses startsWith("/v1/models") which
unintentionally matches paths like /v1/models-malicious; update the predicate in
the deny-all-by-default policy to more strictly match only the /v1/models
endpoint and its subpaths (e.g., use startsWith("/v1/models/") for subpaths and
a separate equality check for exactly "/v1/models", or use a regex anchor like
^/v1/models($|/) depending on policy syntax) and verify the corresponding
AuthPolicy in default-auth.yaml still grants access for the intended allowed
paths.
charts/maas/crds/maas.opendatahub.io_maassubscriptions.yaml (1)

109-131: Minor schema inconsistency: owner.users vs owner.groups structure differs.

users is a simple string array while groups is an array of objects with a name field. This asymmetry may complicate client code. Consider aligning both as object arrays for consistency and future extensibility (e.g., adding clusterScoped to users).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/crds/maas.opendatahub.io_maassubscriptions.yaml` around lines 109
- 131, Align the owner.users schema to match owner.groups by changing users from
an array of strings to an array of objects (e.g., UserReference) with a name
property, mirroring the GroupReference structure; update the CRD definition for
properties.owner.users to use items.type: object with a required - name and a
name property of type string, and add any future fields such as clusterScoped to
the new UserReference shape so users and groups share a consistent structure
(references: owner.users, owner.groups, GroupReference, name, clusterScoped).
charts/maas/templates/policies/azure-ad-per-user-ratelimit.yaml (1)

19-20: Quote the window duration value.

The window value from values.yaml is 1h (unquoted YAML string). If Kuadrant expects a string duration, this should work, but defensive quoting ensures consistent behavior regardless of how the value is overridden:

         rates:
           - limit: {{ .Values.azureAD.perUserRateLimit.limit }}
-            window: {{ .Values.azureAD.perUserRateLimit.window }}
+            window: {{ .Values.azureAD.perUserRateLimit.window | quote }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/templates/policies/azure-ad-per-user-ratelimit.yaml` around lines
19 - 20, The template is inserting the unquoted duration for
.Values.azureAD.perUserRateLimit.window which can be misinterpreted if
overridden; update the azure-ad-per-user-ratelimit.yaml template to render the
window as a quoted string (use the template value
.Values.azureAD.perUserRateLimit.window wrapped in quotes) so the generated YAML
always provides a string duration to Kuadrant while leaving limit (
.Values.azureAD.perUserRateLimit.limit ) unchanged.
charts/maas/crds/maas.opendatahub.io_maasauthpolicies.yaml (1)

131-138: Consider adding enum constraint for accepted and enforced status fields.

These fields describe condition states but lack validation. Without constraints, any string value is accepted, potentially causing confusion or parsing issues for consumers expecting True, False, or Unknown.

accepted:
  description: Accepted reports whether the AuthPolicy has been accepted
  enum:
    - "True"
    - "False"
    - Unknown
  type: string
enforced:
  description: Enforced reports whether the AuthPolicy is enforced
  enum:
    - "True"
    - "False"
    - Unknown
  type: string
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/crds/maas.opendatahub.io_maasauthpolicies.yaml` around lines 131
- 138, The CRD's status fields accepted and enforced currently allow any string;
update the maas.opendatahub.io_maasauthpolicies.yaml definition for the accepted
and enforced properties to add an enum constraint with the allowed values
"True", "False", and "Unknown" while keeping type: string and preserving the
existing descriptions so consumers can validate and parse these condition states
reliably.
Makefile (3)

153-158: Duplicate CRD deletion between undeploy-maas and cleanup.sh.

Line 115 deletes MaaS CRDs explicitly, then Line 158 invokes cleanup.sh which also deletes CRDs matching maas.opendatahub.io (see scripts/cleanup.sh:144-147). While --ignore-not-found prevents errors, this redundancy creates maintenance confusion about ownership of teardown responsibilities.

Consider either:

  1. Removing CRD deletion from undeploy-maas and letting cleanup.sh handle it, or
  2. Adding a flag to cleanup.sh to skip MaaS CRDs when called from undeploy
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 153 - 158, The Makefile currently duplicates CRD
deletion: the `undeploy-maas` target removes MaaS CRDs while the `undeploy`
target calls ./scripts/cleanup.sh which also deletes `maas.opendatahub.io` CRDs;
pick one approach and remove redundancy by either removing the explicit CRD
deletion from the `undeploy-maas` target (leave full teardown to `cleanup.sh`)
or modify `scripts/cleanup.sh` to accept a flag (e.g., --skip-maas) and update
the `undeploy` target to call `./scripts/cleanup.sh --skip-maas -y`; update
references to `undeploy-maas` and `cleanup.sh` accordingly and ensure
`--ignore-not-found` behavior is preserved.

100-110: Prerequisite CRD checks duplicated in helmfile presync hook.

Lines 103-108 check for KServe, Kuadrant, and Gateway API CRDs. The same checks exist in charts/maas/helmfile.yaml.gotmpl lines 37-52. This duplication means changes must be synchronized in two places.

Not blocking, but consider consolidating prerequisite validation in one location.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 100 - 110, The prerequisite CRD checks are duplicated
between the Makefile target deploy-maas and the helmfile presync hook in
charts/maas/helmfile.yaml.gotmpl; refactor by moving the CRD validation into a
single reusable place (either a new make target like validate-prereqs or a small
shell script) and have both deploy-maas and the helmfile presync hook invoke
that single validator; update references in deploy-maas (target name
deploy-maas) and remove the duplicate checks from
charts/maas/helmfile.yaml.gotmpl so maintenance is centralized.

64-76: Unquoted shell variables risk word-splitting/globbing.

Several shell variables lack proper quoting. While current usage may be safe, this violates defensive shell scripting practices and can break with unexpected input.

Proposed fix
 deploy-cert-manager: check-kubeconfig clear-cache
 	@# Detect orphaned cert-manager pods not managed by Helm
-	`@if` kubectl get pods -n cert-manager --no-headers 2>/dev/null | grep -q .; then \
-		if ! helm list -n cert-manager-operator 2>/dev/null | grep -q cert-manager; then \
+	`@if` kubectl get pods -n cert-manager --no-headers 2>/dev/null | grep -q '.'; then \
+		if ! helm list -n cert-manager-operator 2>/dev/null | grep -q 'cert-manager'; then \
 			echo "WARNING: cert-manager pods exist but no Helm release found."; \
 			echo "  This is likely an orphaned install. Cleaning up..."; \
 			kubectl delete deployment --all -n cert-manager --ignore-not-found 2>/dev/null || true; \

As per coding guidelines: "Quote shell variables in targets".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 64 - 76, In the deploy-cert-manager Makefile target,
quote all shell variable and command substitutions to prevent
word-splitting/globbing: locate the kubectl/helm/grep/xargs pipeline (the lines
calling helm list -n cert-manager-operator, kubectl get pods -n cert-manager,
kubectl get clusterrole,clusterrolebinding -o name | grep -i cert-manager |
xargs -r kubectl delete, and kubectl delete sa --all -n cert-manager) and change
any unquoted expansions or command substitutions to quoted forms (e.g. "$VAR" or
"$(cmd)"), quote literal patterns passed to grep if they come from variables,
and ensure xargs invocation is robust (use -- where appropriate) so all variable
content is safely handled.
charts/maas/helmfile.yaml.gotmpl (1)

61-64: No validation that ./crds/ directory exists before applying.

Line 63 applies CRDs from ./crds/ but doesn't verify the directory exists. If the chart is extracted incorrectly or the directory is missing, the error message won't clearly indicate the problem.

             # Apply MaaS CRDs via server-side apply (avoids size limits)
             echo "Applying MaaS CRDs..."
+            if [ ! -d "./crds/" ] || [ -z "$(ls -A ./crds/)" ]; then
+              echo "ERROR: ./crds/ directory missing or empty"
+              exit 1
+            fi
             kubectl apply -f ./crds/ --server-side
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/helmfile.yaml.gotmpl` around lines 61 - 64, Add a check that the
./crds/ directory exists before running the kubectl apply; specifically, before
the kubectl apply -f ./crds/ --server-side line in the helmfile template, test
for the directory (e.g., using a shell test like [ -d ./crds ] or [[ -d ./crds
]]) and if it does not exist print a clear error referencing "crds" and exit
non‑zero so the failure message shows the missing directory rather than a
generic kubectl error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/maas/crds/maas.opendatahub.io_externalmodels.yaml`:
- Around line 58-70: The ClusterRole used by the maas-controller lacks
permissions to read Secrets referenced by ExternalModel.spec.credentialRef.name,
causing reconciliation Forbidden errors; update the maas-controller ClusterRole
(the ClusterRole resource in the maas-controller template) to include
core/secrets with verbs ["get","list","watch"] (and optionally "create","update"
only if controller writes secrets) so the controller can read the CredentialRef
Secret during reconciliation for ExternalModel. Ensure the new rule is added
alongside existing rules in the ClusterRole manifest so RBAC covers
credentialRef access.

In `@charts/maas/crds/maas.opendatahub.io_maasauthpolicies.yaml`:
- Around line 114-116: The CEL validation rule under x-kubernetes-validations
(the rule string currently using size(self.groups) > 0 || size(self.users) > 0)
will runtime-fail when optional fields are omitted; update that rule to guard
each optional field with has() before calling size()/length (e.g., check
has(self.groups) before accessing self.groups.size() and likewise for
self.users) so the expression never calls size() on null and short-circuits
correctly.

In `@charts/maas/helmfile.yaml.gotmpl`:
- Around line 76-84: The post-sync script currently masks failures by appending
"|| echo '... not ready'" to the kubectl rollout status calls for deployments
postgres, maas-controller, and maas-api (while set -e is enabled), so rollout
failures do not cause the script to exit; update the kubectl rollout status
invocations in the postsync block to fail fast for critical components—either
remove the "|| echo ..." fallbacks so the command fails and set -e will
terminate the script, or explicitly capture each command's exit code and call
exit 1 (or aggregate errors and exit non‑zero) when rollout for any of
deployment/postgres, deployment/maas-controller, or deployment/maas-api fails.
- Around line 85-103: Replace hardcoded gateway and service-account names with
templated values: define a GATEWAY_NAME using the chart value
(.Values.gateway.name) with the same default currently assumed
("maas-default-gateway") and use that variable wherever "maas-default-gateway"
is referenced (kubectl get gateway and label selector), and derive GATEWAY_SA
from the gateway name (e.g. "${GATEWAY_NAME}-istio" or use the helper that
builds the SA name) instead of the hardcoded "maas-default-gateway-istio"; keep
existing GATEWAY_NS usage and the patch/delete logic but reference GATEWAY_NAME
and GATEWAY_SA variables so the postsync hook works with customized
.Values.gateway.name.

In `@charts/maas/templates/httproute.yaml`:
- Around line 34-35: The chart hardcodes the X-MaaS-Group header which bypasses
authorization; remove the static header entry (X-MaaS-Group: '["maas-admins"]')
and implement conditional logic in the httproute template so that when
azureAD.enabled=true the header is not set to a fixed value but instead is
sourced from the incoming JWT/group claims (or omitted so maas-api reads the
JWT) and when Azure AD is disabled the header can be set from the configured
adminGroup value in values.yaml; update the template around the X-MaaS-Group
header and any related blocks so maas-api cannot be granted admin rights by a
hardcoded header.
- Around line 27-33: The template currently emits a static header
X-MaaS-Username ("azure-ad-user" or "anonymous") which hides the real user; when
.Values.azureAD.enabled is true, replace the static value with the authenticated
principal claim (e.g., preferred_username or sub) by configuring the HTTPRoute
to set X-MaaS-Username from the auth identity (use Kuadrant/ingress-gateway
identity features or a request header filter that maps
request.auth.principal.claims["preferred_username"] ||
request.auth.principal.claims["sub"] into the header); keep the fallback to
"anonymous" when no identity is present and apply the same change for the later
block that sets X-MaaS-Username (the other occurrence).

In `@charts/maas/templates/maas-api/clusterrole.yaml`:
- Around line 28-30: The ClusterRole currently grants cluster-wide secrets CRUD
which is overly broad; remove the secrets rule from the ClusterRole in
charts/maas/templates/maas-api/clusterrole.yaml and instead create a namespaced
Role named maas-api-secrets and a RoleBinding that binds that Role to the
maas-api ServiceAccount in the chart namespace (use the include "maas.namespace"
template for namespace resolution); ensure the new Role contains the secrets
verbs (create, delete, get, list, patch, update, watch) and the RoleBinding
references kind: Role, name: maas-api-secrets and the subject ServiceAccount
maas-api.

In `@charts/maas/templates/maas-api/service.yaml`:
- Around line 11-13: The Service selector in service.yaml uses only
app.kubernetes.io/name which is too broad and allows label spoofing; update the
selector on the Service resource to include additional deterministic labels (for
example add app.kubernetes.io/instance or app.kubernetes.io/managed-by and
app.kubernetes.io/component/release labels) so it exactly matches the Pods
created by the maas-api Deployment/StatefulSet (match the labels used in the
maas-api Pod template), e.g. include the same app.kubernetes.io/instance/
component keys used by your chart (replace with {{ .Release.Name }} or the
component value used in the Deployment) so no other pods in the namespace can be
selected.

In `@charts/maas/templates/networkpolicy.yaml`:
- Around line 9-31: The NetworkPolicy for the maas-api pod (podSelector
matchLabels app.kubernetes.io/name: maas-api) currently allows ingress only from
Authorino pods in kuadrant-system and kuadrant-operators, which blocks traffic
from the Istio gateway; update the ingress stanza in the NetworkPolicy to also
allow traffic from the istio-system namespace by adding a namespaceSelector
matchLabels entry for kubernetes.io/metadata.name: istio-system (in the same
ingress.from list which contains the existing kuadrant entries) so that Envoy in
istio-system can reach maas-api:8080.

In `@charts/maas/templates/policies/azure-ad-admin-only.yaml`:
- Around line 22-28: The authorization rule using admin-group-check with
patternMatching that selects auth.identity.groups and value {{
.Values.azureAD.adminGroup | quote }} assumes the JWT contains a groups claim
which Azure AD does not emit by default and will omit when a user has >200
groups; update the chart/template to document this prerequisite and recommended
fixes: add README/values documentation that operators must set
groupMembershipClaims in the app registration (or use app roles), and update
policy guidance to either (a) prefer app roles (roles claim) or (b) implement
Graph API fallback when hasgroups/_claim_sources overage is present; reference
admin-group-check, selector auth.identity.groups, and .Values.azureAD.adminGroup
in the documentation so maintainers know exactly which policy and value are
affected.

In `@charts/maas/templates/policies/default-auth.yaml`:
- Around line 18-27: When azureAD.enabled is true the template must validate
.Values.azureAD.tenantId and .Values.azureAD.clientId before emitting the
issuerUrl and audience to avoid malformed URLs/empty audiences; update the
default-auth.yaml helm template to use Helm's required function (or equivalent
conditional checks) inside the azure-ad block so that required("message",
.Values.azureAD.tenantId) and required("message", .Values.azureAD.clientId) are
called (or the block is skipped/fails fast) and only then construct issuerUrl
"https://login.microsoftonline.com/{{ .Values.azureAD.tenantId }}/v2.0" and the
audiences list with {{ .Values.azureAD.clientId }}.

In `@charts/maas/templates/policies/maas-api-auth.yaml`:
- Around line 1-20: The rendered AuthPolicy (metadata.name: maas-api-allow)
currently enables anonymous access via spec.rules.authentication.anonymous and
an empty authorization block; change it to fail-closed when
.Values.azureAD.enabled is false by either not emitting an allow-anonymous
policy or by emitting a denying policy for maas-api-route (e.g., remove
spec.rules.authentication.anonymous and set authorization to explicitly deny or
require authentication), and add an opt-in values flag (e.g.,
.Values.api.allowAnonymous) if you need to permit anonymous behavior; update the
template around the AuthPolicy logic and the spec.rules sections (metadata.name
maas-api-allow, spec.targetRef/name maas-api-route) accordingly so anonymous
access is not granted by default.

In `@charts/maas/templates/postgresql/deployment.yaml`:
- Around line 27-70: The container lacks a securityContext which allows root,
writable rootfs and all capabilities; add a container-level securityContext
under the postgresql container (name: postgresql) that sets runAsUser: 26 and
runAsNonRoot: true, disallows privilege escalation (allowPrivilegeEscalation:
false), drops all capabilities (capabilities.drop: ["ALL"]), keeps
readOnlyRootFilesystem false so /var/lib/pgsql/data remains writable, and add a
seccompProfile (type: RuntimeDefault) and an appropriate runAsGroup if needed;
apply these settings next to the existing
ports/env/volumeMounts/readinessProbe/livenessProbe entries to harden the
container without breaking PostgreSQL writes to the pgdata mount.

In `@charts/maas/templates/postgresql/secret.yaml`:
- Around line 11-14: The secret currently hardcodes POSTGRESQL_USER and
POSTGRESQL_PASSWORD; update the template to read values from chart values (e.g.,
postgresql.auth.username, postgresql.auth.password,
postgresql.auth.existingSecret) and fall back to generating a random password
when postgresql.auth.password is empty, and support mounting values from an
existing secret when postgresql.auth.existingSecret is set; ensure
POSTGRESQL_DATABASE continues to use .Values.postgresql.database and that the
template uses those variables (POSTGRESQL_USER, POSTGRESQL_PASSWORD,
POSTGRESQL_DATABASE) instead of hardcoded strings so production can reference an
external secret and dev can get a generated password.

In `@charts/maas/values.yaml`:
- Around line 14-17: Replace all occurrences of unpinned image tags by setting
explicit version tags or digests for the three image values instead of `latest`:
update maasApi.image.tag, maasController.image.tag, and postgresql.image.tag
(and any other image.tag fields around lines referenced) to a fixed version
string or immutable digest (e.g., vX.Y.Z or sha256:...) so deployments are
reproducible and rollbacks reliable; ensure pullPolicy is adjusted if using
digests (pullPolicy: IfNotPresent or Always as appropriate).
- Around line 60-74: The current postgresql block uses ephemeral storage
(postgresql.storage: emptyDir) which causes data loss; update the values schema
for the postgresql section to support a storage.type option (allowed values like
emptyDir or persistentVolumeClaim) and add a nested pvc object (fields:
storageClassName and size) so consumers can choose persistent storage; modify
the postgresql handling to read storage.type and, when persistentVolumeClaim is
selected, populate the chart's PVC template with pvc.storageClassName and
pvc.size, leaving emptyDir as the default dev option and document the new
postgresql.storage.* fields.

In `@scripts/cleanup.sh`:
- Around line 107-112: The cleanup script hardcodes the gateway namespace
(istio-system) when deleting maas resources; modify scripts/cleanup.sh to accept
a --gateway-namespace argument (e.g., GATEWAY_NAMESPACE) or auto-discover the
namespace by querying for the maas gateway (use kubectl get gateway
--all-namespaces | grep maas-default-gateway to extract the namespace) and then
use that variable in the kubectl delete commands that reference the gateway and
authpolicy/tokenratelimitpolicy (the lines invoking kubectl delete gateway
maas-default-gateway -n istio-system and kubectl delete
authpolicy,tokenratelimitpolicy -n istio-system). Ensure the new logic falls
back to istio-system if no namespace is provided or discovered.

---

Outside diff comments:
In `@scripts/cleanup.sh`:
- Around line 63-68: The kubectl patch loops use unquoted shell variables ($rev
and $ist) which can cause word splitting for resource names with spaces/special
chars; update the two kubectl patch invocations that reference $rev and $ist to
quote the variables (use "$rev" and "$ist") so each resource name is passed as a
single argument to kubectl, keeping the surrounding command structure (the while
read loops and the 2>/dev/null || true behavior) intact.
- Around line 1-4: Add strict error handling to the script by inserting "set
-euo pipefail" immediately after the existing shebang (#!/bin/bash) in
cleanup.sh; this ensures the script exits on errors, treats unset variables as
errors, and fails on pipeline errors—after adding it, scan the script for any
unquoted variable expansions or intentional unset-variable usages and fix them
(declare default values or quote expansions) so the new strict mode doesn't
break legitimate behavior.

---

Minor comments:
In `@charts/maas/helmfile.yaml.gotmpl`:
- Around line 93-101: The retry loop that checks for the service account
(variables GATEWAY_SA and GATEWAY_NS) and runs kubectl patch may exit silently
after 5 retries without applying the imagePullSecrets, so change the logic in
that loop to detect failure: after the for-loop, check whether the patch
succeeded (e.g., test kubectl get sa "$GATEWAY_SA" -n "$GATEWAY_NS" and verify
imagePullSecrets) and if not, emit a clear error via echo/process exit with
non-zero status so the deployment aborts before the subsequent gateway pod
restart; update the block around the kubectl patch invocation to set a success
flag or use the patch command's exit code and fail fast when exhausted.

In `@charts/maas/templates/namespace.yaml`:
- Line 6: The template uses {{ .Values.maasApi.subscriptionNamespace }} directly
which allows empty/missing values and produces invalid Namespace manifests;
update the Helm template in namespace.yaml to validate and fail fast by
replacing the bare reference with the required function (e.g. required "error
message" .Values.maasApi.subscriptionNamespace) so template rendering errors
clearly when maasApi.subscriptionNamespace is not provided or empty; ensure the
error message identifies maasApi.subscriptionNamespace so operators know which
value to set.

In `@charts/maas/templates/postgresql/deployment.yaml`:
- Around line 55-68: Health probes hardcode "-U maas" which will break if
POSTGRESQL_USER from the secret differs; update the readinessProbe and
livenessProbe exec command lines (readinessProbe, livenessProbe) to use the
runtime environment variable instead of a literal user by substituting the
container env var (e.g. use ${POSTGRESQL_USER} or sh -c "pg_isready -U
${POSTGRESQL_USER} -d ${POSTGRESQL_DATABASE}") so the probes read the
secret-provided POSTGRESQL_USER/POSTGRESQL_DATABASE values at runtime.

In `@charts/maas/templates/postgresql/secret.yaml`:
- Around line 25-26: The DB_CONNECTION_URL in secret.yaml currently hardcodes
credentials instead of using the parameterized values, causing duplication and
drift; update DB_CONNECTION_URL to reference the same username/password source
used elsewhere (e.g., the postgresql secret keys or
.Values.postgresql.username/.password) rather than embedding "maas" and
"maas-dev-password", or retrieve the generated password via a Helm lookup to the
first secret; alternatively extract the password generation into a shared helper
so both the secret data and DB_CONNECTION_URL use the same generated value
(avoid calling randAlphaNum twice).

In `@charts/maas/templates/pull-secret.yaml`:
- Around line 15-16: The template currently always applies b64enc to
.Values.pullSecret.dockerConfigJson which will double-encode inputs that are
already base64; update the chart so the template and values agree: either (A)
document in values.yaml that pullSecret.dockerConfigJson must be RAW JSON (and
keep the b64enc in the template), or (B) implement a conditional in the
pull-secret.yaml template that detects if .Values.pullSecret.dockerConfigJson is
already base64 (e.g. via a simple regex helper) and only calls b64enc when it is
not; reference .dockerconfigjson in the template and
.Values.pullSecret.dockerConfigJson and the b64enc invocation when making the
change.

In `@Makefile`:
- Around line 198-206: The Makefile health/status target hardcodes the gateway
name "maas-default-gateway" which breaks when users set charts/maas/values.yaml
gateway.name to a custom value; update the kubectl check in the target that
calls kubectl get gateway maas-default-gateway -n istio-system to instead read
the configured gateway name (e.g., via a variable like MAAS_GATEWAY or by
parsing charts/maas/values.yaml or Helm release values) and use that variable in
the kubectl command so the check honors the configured gateway.name.

---

Nitpick comments:
In `@charts/maas/crds/maas.opendatahub.io_maasauthpolicies.yaml`:
- Around line 131-138: The CRD's status fields accepted and enforced currently
allow any string; update the maas.opendatahub.io_maasauthpolicies.yaml
definition for the accepted and enforced properties to add an enum constraint
with the allowed values "True", "False", and "Unknown" while keeping type:
string and preserving the existing descriptions so consumers can validate and
parse these condition states reliably.

In `@charts/maas/crds/maas.opendatahub.io_maassubscriptions.yaml`:
- Around line 109-131: Align the owner.users schema to match owner.groups by
changing users from an array of strings to an array of objects (e.g.,
UserReference) with a name property, mirroring the GroupReference structure;
update the CRD definition for properties.owner.users to use items.type: object
with a required - name and a name property of type string, and add any future
fields such as clusterScoped to the new UserReference shape so users and groups
share a consistent structure (references: owner.users, owner.groups,
GroupReference, name, clusterScoped).

In `@charts/maas/helmfile.yaml.gotmpl`:
- Around line 61-64: Add a check that the ./crds/ directory exists before
running the kubectl apply; specifically, before the kubectl apply -f ./crds/
--server-side line in the helmfile template, test for the directory (e.g., using
a shell test like [ -d ./crds ] or [[ -d ./crds ]]) and if it does not exist
print a clear error referencing "crds" and exit non‑zero so the failure message
shows the missing directory rather than a generic kubectl error.

In `@charts/maas/templates/policies/azure-ad-admin-only.yaml`:
- Line 1: Add validation so the template fails early when
.Values.azureAD.enabled is true and required fields are missing: use Helm's
required function to assert .Values.azureAD.tenantId and
.Values.azureAD.clientId (e.g., assign to local variables like $tenantId and
$clientId via required) before constructing issuerUrl or audience, and then use
those $tenantId/$clientId variables in the issuerUrl and audience logic to
prevent emitting invalid values when tenantId or clientId are empty.

In `@charts/maas/templates/policies/azure-ad-per-user-ratelimit.yaml`:
- Around line 19-20: The template is inserting the unquoted duration for
.Values.azureAD.perUserRateLimit.window which can be misinterpreted if
overridden; update the azure-ad-per-user-ratelimit.yaml template to render the
window as a quoted string (use the template value
.Values.azureAD.perUserRateLimit.window wrapped in quotes) so the generated YAML
always provides a string duration to Kuadrant while leaving limit (
.Values.azureAD.perUserRateLimit.limit ) unchanged.

In `@charts/maas/templates/policies/default-deny.yaml`:
- Around line 17-22: The deny-all-by-default predicate uses
startsWith("/v1/models") which unintentionally matches paths like
/v1/models-malicious; update the predicate in the deny-all-by-default policy to
more strictly match only the /v1/models endpoint and its subpaths (e.g., use
startsWith("/v1/models/") for subpaths and a separate equality check for exactly
"/v1/models", or use a regex anchor like ^/v1/models($|/) depending on policy
syntax) and verify the corresponding AuthPolicy in default-auth.yaml still
grants access for the intended allowed paths.

In `@Makefile`:
- Around line 153-158: The Makefile currently duplicates CRD deletion: the
`undeploy-maas` target removes MaaS CRDs while the `undeploy` target calls
./scripts/cleanup.sh which also deletes `maas.opendatahub.io` CRDs; pick one
approach and remove redundancy by either removing the explicit CRD deletion from
the `undeploy-maas` target (leave full teardown to `cleanup.sh`) or modify
`scripts/cleanup.sh` to accept a flag (e.g., --skip-maas) and update the
`undeploy` target to call `./scripts/cleanup.sh --skip-maas -y`; update
references to `undeploy-maas` and `cleanup.sh` accordingly and ensure
`--ignore-not-found` behavior is preserved.
- Around line 100-110: The prerequisite CRD checks are duplicated between the
Makefile target deploy-maas and the helmfile presync hook in
charts/maas/helmfile.yaml.gotmpl; refactor by moving the CRD validation into a
single reusable place (either a new make target like validate-prereqs or a small
shell script) and have both deploy-maas and the helmfile presync hook invoke
that single validator; update references in deploy-maas (target name
deploy-maas) and remove the duplicate checks from
charts/maas/helmfile.yaml.gotmpl so maintenance is centralized.
- Around line 64-76: In the deploy-cert-manager Makefile target, quote all shell
variable and command substitutions to prevent word-splitting/globbing: locate
the kubectl/helm/grep/xargs pipeline (the lines calling helm list -n
cert-manager-operator, kubectl get pods -n cert-manager, kubectl get
clusterrole,clusterrolebinding -o name | grep -i cert-manager | xargs -r kubectl
delete, and kubectl delete sa --all -n cert-manager) and change any unquoted
expansions or command substitutions to quoted forms (e.g. "$VAR" or "$(cmd)"),
quote literal patterns passed to grep if they come from variables, and ensure
xargs invocation is robust (use -- where appropriate) so all variable content is
safely handled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 6bc93fa8-ea1e-4485-96e4-64485e264fa0

📥 Commits

Reviewing files that changed from the base of the PR and between 22ddc2c and 58b9868.

📒 Files selected for processing (35)
  • Makefile
  • charts/maas/Chart.yaml
  • charts/maas/crds/maas.opendatahub.io_externalmodels.yaml
  • charts/maas/crds/maas.opendatahub.io_maasauthpolicies.yaml
  • charts/maas/crds/maas.opendatahub.io_maasmodelrefs.yaml
  • charts/maas/crds/maas.opendatahub.io_maassubscriptions.yaml
  • charts/maas/helmfile.yaml.gotmpl
  • charts/maas/templates/_helpers.tpl
  • charts/maas/templates/gateway.yaml
  • charts/maas/templates/httproute.yaml
  • charts/maas/templates/maas-api/clusterrole.yaml
  • charts/maas/templates/maas-api/deployment.yaml
  • charts/maas/templates/maas-api/service.yaml
  • charts/maas/templates/maas-api/serviceaccount.yaml
  • charts/maas/templates/maas-controller/clusterrole.yaml
  • charts/maas/templates/maas-controller/clusterrolebinding.yaml
  • charts/maas/templates/maas-controller/deployment.yaml
  • charts/maas/templates/maas-controller/leader-election-role.yaml
  • charts/maas/templates/maas-controller/serviceaccount.yaml
  • charts/maas/templates/namespace.yaml
  • charts/maas/templates/networkpolicy.yaml
  • charts/maas/templates/policies/azure-ad-admin-only.yaml
  • charts/maas/templates/policies/azure-ad-per-user-ratelimit.yaml
  • charts/maas/templates/policies/default-auth.yaml
  • charts/maas/templates/policies/default-deny.yaml
  • charts/maas/templates/policies/maas-api-auth.yaml
  • charts/maas/templates/postgresql/deployment.yaml
  • charts/maas/templates/postgresql/secret.yaml
  • charts/maas/templates/pull-secret.yaml
  • charts/maas/templates/tier-mapping.yaml
  • charts/maas/values.yaml
  • charts/rhcl/helmfile.yaml.gotmpl
  • helmfile.yaml.gotmpl
  • scripts/cleanup.sh
  • values.yaml

- Add secrets RBAC to maas-controller ClusterRole (credentialRef access)
- Use templated gateway name instead of hardcoded values in postsync
- Track and report deployment errors in postsync hook
- Quote shell variables in cleanup.sh finalizer loops
- Add make lint target (helm lint, yamllint, shellcheck)

Made-with: Cursor
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
charts/maas/helmfile.yaml.gotmpl (2)

87-88: ⚠️ Potential issue | 🟠 Major

The integrated stack still can't honor a custom gateway name.

GATEWAY_NAME reads .Values.gateway.name, but the parent helmfile.yaml.gotmpl:76-87 only forwards namespace and gatewayNamespace. In stack deployments this still falls back to maas-default-gateway, so the service-account patch and pod restart target the wrong gateway when a custom name is configured. Pass the gateway name into this sub-helmfile explicitly.

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/helmfile.yaml.gotmpl` around lines 87 - 88, The sub-helmfile uses
GATEWAY_NAME="{{ .Values.gateway.name | default "maas-default-gateway" }}" but
the parent helmfile does not forward the gateway name, causing the default to be
used; update the parent helmfile template that currently forwards namespace and
gatewayNamespace to also pass gateway.name (e.g., add a gatewayName or similar
key) into the sub-helmfile values so .Values.gateway.name is populated and
GATEWAY_NAME reflects a custom gateway; ensure the symbol names referenced are
.Values.gateway.name in the sub-helmfile and the parent template entry that
emits gatewayNamespace so you add the corresponding gateway name field there.

105-107: ⚠️ Potential issue | 🟠 Major

Return non-zero when rollout validation finds failures.

ERRORS is accumulated, but the hook still exits 0 here. helmfile apply will report success even when postgres, maas-controller, or maas-api never become Ready, which breaks automation and follow-up targets.

Minimal fix
             if [ "$ERRORS" -gt 0 ]; then
-              echo "WARNING: $ERRORS component(s) not ready. Check pod status."
+              echo "ERROR: $ERRORS component(s) not ready. Check pod status."
+              exit 1
             fi

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/helmfile.yaml.gotmpl` around lines 105 - 107, The rollout-check
hook currently prints a warning when the ERRORS counter is >0 but still exits
with status 0; modify the if block that checks ERRORS (the rollout validation
near the end of the helmfile hook) to return a non-zero exit code when failures
are detected (e.g., call exit 1 or exit $ERRORS) so that helmfile apply fails on
readiness errors and downstream automation sees the failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/maas/helmfile.yaml.gotmpl`:
- Around line 54-59: The current kubectl check using "kubectl get kuadrant -n
kuadrant-system" is a no-op because it exits 0 when the CRD exists but no
instances are present; change the check so it verifies that at least one
Kuadrant resource exists and fail closed if none are found. Replace the existing
branch that prints "Kuadrant instance found in kuadrant-system" / "WARNING: No
Kuadrant instance found..." with a test that inspects the command output (for
example using "kubectl get kuadrant -n kuadrant-system -o name" or a jsonpath
query) and treat an empty output as absence — only print the success message
when a non-empty resource list is returned, otherwise print the warning and exit
non-zero so deployment cannot proceed without an actual Kuadrant instance.

In `@charts/maas/templates/maas-controller/clusterrole.yaml`:
- Around line 9-11: The ClusterRole in
charts/maas/templates/maas-controller/clusterrole.yaml currently allows verbs
["get","list","watch"] on resource "secrets", which enables cluster-wide secret
enumeration; change this to only "get" for "secrets" (remove "list" and "watch")
if the controller only needs credentialRef lookups, or replace the ClusterRole
with a namespaced Role and bind it only in MaaS-managed namespaces if
watches/listening across namespaces are truly required; update the
ClusterRole/Role definition and any RoleBinding/ClusterRoleBinding references
accordingly.

In `@Makefile`:
- Around line 199-201: The Makefile invokes kubectl with an unquoted
KSERVE_NAMESPACE which allows shell injection; update the two kubectl commands
that use -n $(KSERVE_NAMESPACE) so the namespace variable is passed as a single
quoted argument (e.g. wrap the expansion in quotes like "$(KSERVE_NAMESPACE)" or
'${KSERVE_NAMESPACE}') in the kubectl get pods -n ... calls to prevent the shell
from interpreting injected characters.
- Around line 59-63: The Makefile currently allows deploy-all to include
deploy-maas even when RHCL is false (RHCL_TARGET/MAAS_TARGET logic), so add a
fast-fail guard that checks the invalid combination (MAAS=true and RHCL!=true)
before any deployment steps; implement this by creating a small phony target
(e.g., fail-if-maas-without-rhcl) that inspects the RHCL/MAAS variables and
exits non‑zero with a clear message when MAAS is true but RHCL is not, and then
invoke that guard as the first prerequisite of deploy-all (before
deploy-cert-manager, deploy-istio, deploy-lws, etc.) so deploy-all bails out
immediately instead of queuing deploy-maas.
- Around line 67-76: The Makefile currently deletes resources in the
cert-manager namespace when it finds pods but no Helm release in
cert-manager-operator; change this to first require an explicit opt-in variable
(e.g. CLEAN_ORPHANED_CERT_MANAGER="true") before executing the destructive
block, and replace name/grep-based deletes with label/annotation checks — use
kubectl with a selector (e.g.
--selector=app.kubernetes.io/managed-by=<operator-label> or check for a specific
ownership annotation) when calling kubectl delete deployment, kubectl delete sa,
and when selecting clusterrole/clusterrolebinding (avoid grep -i cert-manager),
only proceed with cluster-scoped deletes if the resources carry the expected
ownership label/annotation; keep the existing fallback no-op behavior when the
opt-in flag is not set.

---

Duplicate comments:
In `@charts/maas/helmfile.yaml.gotmpl`:
- Around line 87-88: The sub-helmfile uses GATEWAY_NAME="{{ .Values.gateway.name
| default "maas-default-gateway" }}" but the parent helmfile does not forward
the gateway name, causing the default to be used; update the parent helmfile
template that currently forwards namespace and gatewayNamespace to also pass
gateway.name (e.g., add a gatewayName or similar key) into the sub-helmfile
values so .Values.gateway.name is populated and GATEWAY_NAME reflects a custom
gateway; ensure the symbol names referenced are .Values.gateway.name in the
sub-helmfile and the parent template entry that emits gatewayNamespace so you
add the corresponding gateway name field there.
- Around line 105-107: The rollout-check hook currently prints a warning when
the ERRORS counter is >0 but still exits with status 0; modify the if block that
checks ERRORS (the rollout validation near the end of the helmfile hook) to
return a non-zero exit code when failures are detected (e.g., call exit 1 or
exit $ERRORS) so that helmfile apply fails on readiness errors and downstream
automation sees the failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: f03653cd-9b3a-4ff5-9ae2-c3045cd730b5

📥 Commits

Reviewing files that changed from the base of the PR and between 58b9868 and 0713099.

📒 Files selected for processing (4)
  • Makefile
  • charts/maas/helmfile.yaml.gotmpl
  • charts/maas/templates/maas-controller/clusterrole.yaml
  • scripts/cleanup.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/cleanup.sh

- Restrict controller secrets RBAC to get-only (least privilege)
- Guard MAAS=true without RHCL=true in Makefile (fail early)
- Make cert-manager orphan cleanup skippable (CERT_MANAGER_FORCE_CLEANUP)
- Quote KSERVE_NAMESPACE in Makefile shell commands
- Fix Kuadrant presence check to verify instance count, not just CRD
- Exit non-zero from postsync when components fail rollout

Made-with: Cursor
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
charts/maas/templates/maas-controller/clusterrole.yaml (1)

9-11: ⚠️ Potential issue | 🟠 Major

Scope Secret reads to opted-in namespaces.

Major. This ClusterRole is bound cluster-wide to maas-controller, so a compromised controller pod can still read predictable secret names in any namespace (CWE-250, CWE-284). Drop secrets from the ClusterRole and re-grant get with namespaced Role/RoleBinding objects only where MaaS credentials are actually stored.

Minimal hardening
- apiGroups: [""]
-  resources: ["secrets"]
-  verbs: ["get"]
+# Remove Secret access from the ClusterRole.
+# Re-add `get` through namespaced Role/RoleBinding objects in approved namespaces.

As per coding guidelines, **: REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/templates/maas-controller/clusterrole.yaml` around lines 9 - 11,
The ClusterRole for maas-controller currently grants cluster-wide read access to
"secrets" (verbs: ["get"]) — remove "secrets" from the ClusterRole definition so
it no longer grants any secret access, and instead create namespaced Role and
RoleBinding objects in only the namespaces that actually store MaaS credentials
granting verbs: ["get"] on resources: ["secrets"]; ensure the new Role is named
(e.g., maas-controller-secrets) and the RoleBinding binds the maas-controller
ServiceAccount to that Role in each opted-in namespace so secret access is
limited and no ClusterRole contains "secrets".
Makefile (3)

69-80: ⚠️ Potential issue | 🟠 Major

The cert-manager cleanup path is still opt-out and still deletes by --all.

With $${CERT_MANAGER_FORCE_CLEANUP:-true}, any cluster that has cert-manager installed by OLM, GitOps, or Helm in another namespace can still hit the deployment --all / sa --all deletes here. Default this flag to false and only remove resources that carry this stack's ownership labels or annotations.

At minimum, make cleanup explicit
-			if [ "$${CERT_MANAGER_FORCE_CLEANUP:-true}" = "true" ]; then \
+			if [ "$${CERT_MANAGER_FORCE_CLEANUP:-false}" = "true" ]; then \

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 69 - 80, Change the cert-manager cleanup to be opt-in
and label-scoped: set the CERT_MANAGER_FORCE_CLEANUP default to "false" (replace
$${CERT_MANAGER_FORCE_CLEANUP:-true} with $${CERT_MANAGER_FORCE_CLEANUP:-false})
and stop using "kubectl delete ... --all"; instead delete only resources
matching this stack's ownership labels/annotations by using kubectl delete with
a selector (e.g., kubectl delete deployment -n cert-manager -l
<ownership-label>=<value> and kubectl delete sa -n cert-manager -l
<ownership-label>=<value>), preserving the existing guard that skips cleanup
when CERT_MANAGER_FORCE_CLEANUP is false.

205-212: ⚠️ Potential issue | 🟡 Minor

status hardcodes the MaaS location.

MaaS has its own namespace, gatewayNamespace, and gateway.name values, but this target always checks $(KSERVE_NAMESPACE), istio-system, and maas-default-gateway. On clusters where KServe lives in redhat-ods-applications or those values are overridden, make status will report a healthy MaaS install as missing.

Use MaaS-specific variables here
+MAAS_NAMESPACE ?= opendatahub
+MAAS_GATEWAY_NAMESPACE ?= istio-system
+MAAS_GATEWAY_NAME ?= maas-default-gateway
...
-	`@kubectl` get pods -n "$(KSERVE_NAMESPACE)" -l control-plane=maas-controller 2>/dev/null || echo "  Not deployed (optional component)"
-	`@kubectl` get pods -n "$(KSERVE_NAMESPACE)" -l app.kubernetes.io/name=maas-api 2>/dev/null || echo "  "
+	`@kubectl` get pods -n "$(MAAS_NAMESPACE)" -l control-plane=maas-controller 2>/dev/null || echo "  Not deployed (optional component)"
+	`@kubectl` get pods -n "$(MAAS_NAMESPACE)" -l app.kubernetes.io/name=maas-api 2>/dev/null || echo "  "
...
-		kubectl get gateway maas-default-gateway -n istio-system 2>/dev/null || echo "  No gateway"; \
+		kubectl get gateway "$(MAAS_GATEWAY_NAME)" -n "$(MAAS_GATEWAY_NAMESPACE)" 2>/dev/null || echo "  No gateway"; \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 205 - 212, The status target in the Makefile hardcodes
MaaS locations; update the kubectl commands in the status target to use
MaaS-specific variables instead of
KSERVE_NAMESPACE/istio-system/maas-default-gateway: replace references to
"$(KSERVE_NAMESPACE)" with the MaaS namespace variable (e.g.,
"$(MAAS_NAMESPACE)" or the repo's maas namespace var), replace the hardcoded
"istio-system" with a gateway namespace variable (e.g.,
"$(MAAS_GATEWAY_NAMESPACE)"), and replace "maas-default-gateway" with a gateway
name variable (e.g., "$(MAAS_GATEWAY_NAME)"); ensure the CRD check and fallback
echo logic remain but reference these variables so overridden/non-default MaaS
installs report correctly.

62-65: ⚠️ Potential issue | 🟠 Major

Move the MAAS/RHCL guard out of top-level $(error ...) evaluation.

GNU Make parses the entire Makefile before target selection. The $(if $(and ...), $(error ...)) at lines 62-63 fires during parse, breaking unrelated commands like make help MAAS=true or make status MAAS=true. Move this check into a prerequisite recipe on deploy-all instead.

Target-scoped guard
-# MaaS requires RHCL — fail early if MAAS=true without RHCL=true
-$(if $(and $(filter true,$(MAAS)),$(filter-out true,$(RHCL))),$(error MAAS=true requires RHCL=true. Run: make deploy-all RHCL=true MAAS=true))
+fail-if-maas-without-rhcl:
+	`@if` [ "$(MAAS)" = "true" ] && [ "$(RHCL)" != "true" ]; then \
+		echo "ERROR: MAAS=true requires RHCL=true. Run: make deploy-all RHCL=true MAAS=true"; \
+		exit 1; \
+	fi

-deploy-all: check-kubeconfig deploy-cert-manager deploy-istio deploy-lws $(RHCL_TARGET) deploy-kserve $(MAAS_TARGET)
+deploy-all: fail-if-maas-without-rhcl check-kubeconfig deploy-cert-manager deploy-istio deploy-lws $(RHCL_TARGET) deploy-kserve $(MAAS_TARGET)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 62 - 65, The top-level parse-time guard using $(if
$(and $(filter true,$(MAAS)),$(filter-out true,$(RHCL))),$(error ...)) causes
errors during Makefile parsing; move this check into a recipe or a target-scoped
prerequisite for the deploy-all target so it runs only when deploy-all is
invoked. Specifically, remove the global $(if ... $(error ...)) expression and
instead add a short phony prerequisite or a target-scoped check for the
deploy-all target (referencing deploy-all) that inspects MAAS and RHCL and calls
$(error ...) at runtime if MAAS=true and RHCL!=true; keep existing prerequisites
(check-kubeconfig deploy-cert-manager deploy-istio deploy-lws $(RHCL_TARGET)
deploy-kserve $(MAAS_TARGET)) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/maas/helmfile.yaml.gotmpl`:
- Around line 91-105: The gateway check currently only echoes missing resources
but doesn't mark the hook as failed; update the block that checks kubectl get
gateway "$GATEWAY_NAME" and the ServiceAccount patch loop for GATEWAY_SA so that
on a missing gateway or when the patch loop exhausts (no successful kubectl
patch after 5 attempts) you increment the same ERRORS counter used for rollout
failures (or set a failure flag) and log an error message, ensuring postsync
exits non‑zero when ERRORS > 0; implement the increment and error logging at the
same points where rollout failures are counted so the hook fails if ingress
never becomes ready.
- Around line 98-99: The hook hardcodes the secret name ("redhat-pull-secret")
when patching the gateway ServiceAccount; instead render the chart value so
overrides work: locate the kubectl patch command in
charts/maas/helmfile.yaml.gotmpl (the line with kubectl patch sa "$GATEWAY_SA"
-n "$GATEWAY_NS" -p ...), and replace the hardcoded "redhat-pull-secret" with
the chart pullSecret value (use the template variable .Values.pullSecret.name,
e.g. render it into the JSON payload via the gotmpl expression and proper
quoting/escaping) so the patch attaches the configured pull secret from
values.yaml/templates/pull-secret.yaml.

In `@Makefile`:
- Around line 107-116: The deploy-maas Makefile target only checks Kuadrant CRDs
but not the required live kuadrant instance, causing Helmfile
(charts/maas/helmfile.yaml.gotmpl) to fail later; update the deploy-maas recipe
to also verify the kuadrant resource exists in the kuadrant-system namespace
(e.g., by running a kubectl get kuadrant <name> or kubectl get kuadrant -n
kuadrant-system and failing with a clear message), so the preflight matches the
Helmfile hook and aborts early if the kuadrant instance is missing.

---

Duplicate comments:
In `@charts/maas/templates/maas-controller/clusterrole.yaml`:
- Around line 9-11: The ClusterRole for maas-controller currently grants
cluster-wide read access to "secrets" (verbs: ["get"]) — remove "secrets" from
the ClusterRole definition so it no longer grants any secret access, and instead
create namespaced Role and RoleBinding objects in only the namespaces that
actually store MaaS credentials granting verbs: ["get"] on resources:
["secrets"]; ensure the new Role is named (e.g., maas-controller-secrets) and
the RoleBinding binds the maas-controller ServiceAccount to that Role in each
opted-in namespace so secret access is limited and no ClusterRole contains
"secrets".

In `@Makefile`:
- Around line 69-80: Change the cert-manager cleanup to be opt-in and
label-scoped: set the CERT_MANAGER_FORCE_CLEANUP default to "false" (replace
$${CERT_MANAGER_FORCE_CLEANUP:-true} with $${CERT_MANAGER_FORCE_CLEANUP:-false})
and stop using "kubectl delete ... --all"; instead delete only resources
matching this stack's ownership labels/annotations by using kubectl delete with
a selector (e.g., kubectl delete deployment -n cert-manager -l
<ownership-label>=<value> and kubectl delete sa -n cert-manager -l
<ownership-label>=<value>), preserving the existing guard that skips cleanup
when CERT_MANAGER_FORCE_CLEANUP is false.
- Around line 205-212: The status target in the Makefile hardcodes MaaS
locations; update the kubectl commands in the status target to use MaaS-specific
variables instead of KSERVE_NAMESPACE/istio-system/maas-default-gateway: replace
references to "$(KSERVE_NAMESPACE)" with the MaaS namespace variable (e.g.,
"$(MAAS_NAMESPACE)" or the repo's maas namespace var), replace the hardcoded
"istio-system" with a gateway namespace variable (e.g.,
"$(MAAS_GATEWAY_NAMESPACE)"), and replace "maas-default-gateway" with a gateway
name variable (e.g., "$(MAAS_GATEWAY_NAME)"); ensure the CRD check and fallback
echo logic remain but reference these variables so overridden/non-default MaaS
installs report correctly.
- Around line 62-65: The top-level parse-time guard using $(if $(and $(filter
true,$(MAAS)),$(filter-out true,$(RHCL))),$(error ...)) causes errors during
Makefile parsing; move this check into a recipe or a target-scoped prerequisite
for the deploy-all target so it runs only when deploy-all is invoked.
Specifically, remove the global $(if ... $(error ...)) expression and instead
add a short phony prerequisite or a target-scoped check for the deploy-all
target (referencing deploy-all) that inspects MAAS and RHCL and calls $(error
...) at runtime if MAAS=true and RHCL!=true; keep existing prerequisites
(check-kubeconfig deploy-cert-manager deploy-istio deploy-lws $(RHCL_TARGET)
deploy-kserve $(MAAS_TARGET)) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e7504aae-3fd6-4ab1-a664-3f919e40b8a1

📥 Commits

Reviewing files that changed from the base of the PR and between 0713099 and b5f7434.

📒 Files selected for processing (3)
  • Makefile
  • charts/maas/helmfile.yaml.gotmpl
  • charts/maas/templates/maas-controller/clusterrole.yaml

Comment on lines +91 to +105
echo "Checking MaaS gateway..."
kubectl get gateway "$GATEWAY_NAME" -n "$GATEWAY_NS" 2>/dev/null || echo "Gateway not found in $GATEWAY_NS"

GATEWAY_SA="${GATEWAY_NAME}-istio"
echo "Patching gateway service account with pull secret..."
for i in 1 2 3 4 5; do
if kubectl get sa "$GATEWAY_SA" -n "$GATEWAY_NS" >/dev/null 2>&1; then
kubectl patch sa "$GATEWAY_SA" -n "$GATEWAY_NS" \
-p '{"imagePullSecrets": [{"name": "redhat-pull-secret"}]}' 2>/dev/null && \
echo " Gateway SA patched" && break
fi
echo " Waiting for gateway SA... ($i/5)"
sleep 5
done
kubectl delete pod -n "$GATEWAY_NS" -l "istio.io/gateway-name=$GATEWAY_NAME" --ignore-not-found 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make gateway validation fail the hook.

This block only logs gateway and ServiceAccount failures, so postsync can still exit 0 while the ingress path never becomes usable. Count a missing gateway and an exhausted patch loop in ERRORS the same way you already do for rollout failures.

Fail when ingress never becomes ready
-            kubectl get gateway "$GATEWAY_NAME" -n "$GATEWAY_NS" 2>/dev/null || echo "Gateway not found in $GATEWAY_NS"
+            kubectl get gateway "$GATEWAY_NAME" -n "$GATEWAY_NS" >/dev/null 2>&1 || { echo "ERROR: Gateway not found in $GATEWAY_NS"; ERRORS=$((ERRORS+1)); }

+            PATCHED=0
             GATEWAY_SA="${GATEWAY_NAME}-istio"
             echo "Patching gateway service account with pull secret..."
             for i in 1 2 3 4 5; do
               if kubectl get sa "$GATEWAY_SA" -n "$GATEWAY_NS" >/dev/null 2>&1; then
                 kubectl patch sa "$GATEWAY_SA" -n "$GATEWAY_NS" \
                   -p '{"imagePullSecrets": [{"name": "redhat-pull-secret"}]}' 2>/dev/null && \
-                  echo "  Gateway SA patched" && break
+                  echo "  Gateway SA patched" && PATCHED=1 && break
               fi
               echo "  Waiting for gateway SA... ($i/5)"
               sleep 5
             done
+            [ "$PATCHED" -eq 1 ] || { echo "ERROR: Gateway SA not patched"; ERRORS=$((ERRORS+1)); }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/helmfile.yaml.gotmpl` around lines 91 - 105, The gateway check
currently only echoes missing resources but doesn't mark the hook as failed;
update the block that checks kubectl get gateway "$GATEWAY_NAME" and the
ServiceAccount patch loop for GATEWAY_SA so that on a missing gateway or when
the patch loop exhausts (no successful kubectl patch after 5 attempts) you
increment the same ERRORS counter used for rollout failures (or set a failure
flag) and log an error message, ensuring postsync exits non‑zero when ERRORS >
0; implement the increment and error logging at the same points where rollout
failures are counted so the hook fails if ingress never becomes ready.

Comment on lines +98 to +99
kubectl patch sa "$GATEWAY_SA" -n "$GATEWAY_NS" \
-p '{"imagePullSecrets": [{"name": "redhat-pull-secret"}]}' 2>/dev/null && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Render the pull-secret name instead of hardcoding it.

charts/maas/values.yaml and charts/maas/templates/pull-secret.yaml make pullSecret.name configurable, but this hook always patches redhat-pull-secret. Any override will attach a nonexistent secret to the gateway ServiceAccount and break image pulls.

Use the chart value here too
+            PULL_SECRET_NAME="{{ .Values.pullSecret.name | default "redhat-pull-secret" }}"
             for i in 1 2 3 4 5; do
               if kubectl get sa "$GATEWAY_SA" -n "$GATEWAY_NS" >/dev/null 2>&1; then
                 kubectl patch sa "$GATEWAY_SA" -n "$GATEWAY_NS" \
-                  -p '{"imagePullSecrets": [{"name": "redhat-pull-secret"}]}' 2>/dev/null && \
+                  -p "{\"imagePullSecrets\": [{\"name\": \"${PULL_SECRET_NAME}\"}]}" 2>/dev/null && \
                   echo "  Gateway SA patched" && break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/maas/helmfile.yaml.gotmpl` around lines 98 - 99, The hook hardcodes
the secret name ("redhat-pull-secret") when patching the gateway ServiceAccount;
instead render the chart value so overrides work: locate the kubectl patch
command in charts/maas/helmfile.yaml.gotmpl (the line with kubectl patch sa
"$GATEWAY_SA" -n "$GATEWAY_NS" -p ...), and replace the hardcoded
"redhat-pull-secret" with the chart pullSecret value (use the template variable
.Values.pullSecret.name, e.g. render it into the JSON payload via the gotmpl
expression and proper quoting/escaping) so the patch attaches the configured
pull secret from values.yaml/templates/pull-secret.yaml.

Comment on lines +107 to +116
deploy-maas: check-kubeconfig clear-cache
@echo "=== Deploying MaaS (Models as a Service) ==="
@echo "Prerequisites: KServe, RHCL, Istio, and cert-manager must be deployed first"
@kubectl get crd llminferenceservices.serving.kserve.io >/dev/null 2>&1 || \
(echo "ERROR: KServe not found. Run 'make deploy-kserve' first." && exit 1)
@kubectl get crd authpolicies.kuadrant.io >/dev/null 2>&1 || \
(echo "ERROR: Kuadrant/RHCL not found. Run 'make deploy-rhcl' first." && exit 1)
@kubectl get crd gateways.gateway.networking.k8s.io >/dev/null 2>&1 || \
(echo "ERROR: Gateway API CRDs not found. Run 'make deploy-istio' first." && exit 1)
helmfile apply --selector name=maas --state-values-set maas.enabled=true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mirror the Kuadrant-instance prerequisite in deploy-maas.

This target only checks the CRDs, but charts/maas/helmfile.yaml.gotmpl now also requires a live kuadrant resource in kuadrant-system. Without the same check here, make deploy-maas passes its own prerequisite gate and then fails later inside Helmfile.

Keep the Makefile preflight aligned with the Helmfile hook
 	`@kubectl` get crd gateways.gateway.networking.k8s.io >/dev/null 2>&1 || \
 		(echo "ERROR: Gateway API CRDs not found. Run 'make deploy-istio' first." && exit 1)
+	`@kubectl` get kuadrant -n kuadrant-system -o name 2>/dev/null | grep -q . || \
+		(echo "ERROR: No Kuadrant instance found in kuadrant-system. Run 'make deploy-rhcl' first." && exit 1)
 	helmfile apply --selector name=maas --state-values-set maas.enabled=true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 107 - 116, The deploy-maas Makefile target only checks
Kuadrant CRDs but not the required live kuadrant instance, causing Helmfile
(charts/maas/helmfile.yaml.gotmpl) to fail later; update the deploy-maas recipe
to also verify the kuadrant resource exists in the kuadrant-system namespace
(e.g., by running a kubectl get kuadrant <name> or kubectl get kuadrant -n
kuadrant-system and failing with a clear message), so the preflight matches the
Helmfile hook and aborts early if the kuadrant instance is missing.

@tsisodia10 tsisodia10 closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants