Skip to content

Commit 2c748c3

Browse files
fix: resolve RBAC namespace mismatch for RHOAI deployments (opendatahub-io#625)
## Description **Summary:** Related to - https://redhat.atlassian.net/browse/RHOAIENG-55555 When RHOAI operator deploys maas-controller to redhat-ods-applications namespace, the ClusterRoleBinding was hardcoded to bind the ServiceAccount in the 'opendatahub' namespace, causing RBAC permission errors and CrashLoopBackOff on startup. **Root cause:** - ClusterRoleBinding hardcoded: namespace: opendatahub - RHOAI deploys to: redhat-ods-applications - ServiceAccount mismatch → Forbidden errors **Changes:** 1. Parameterized RBAC binding namespaces in ODH overlay kustomization - ClusterRoleBinding now uses app-namespace parameter - RoleBinding now uses app-namespace parameter - Works for both opendatahub and redhat-ods-applications 2. Improved namespace creation logic in controller - Check namespace existence before attempting creation - Handle Forbidden errors without retry (operator may pre-create) - Clearer error messages for troubleshooting Fixes maas-controller CrashLoopBackOff in RHOAI 3.4ea2+ deployments. Tested on RHOAI 3.3.0 with DSC ModelsAsServiceReady: True. ## How Has This Been Tested? ### Test Environment - **Platform**: OpenShift 4.x (AWS) - **Cluster**: `api.ci-ln-3pwgqm2-76ef8.aws-4.ci.openshift.org` - **Operator**: RHOAI v3.3.0 (rhods-operator.3.3.0) - **Policy Engine**: RHCL v1.3.1 (Red Hat Connectivity Link) - **Deployment Mode**: Operator (RHOAI) - **Test Date**: 2026-03-26 ### Test Results #### ✅ 1. Controller Pod Status ```bash $ oc get pods -n redhat-ods-applications -l app=maas-controller NAME READY STATUS RESTARTS AGE maas-controller-68574bd4fc-wnb8n 1/1 Running 0 100s ``` **Result**: Pod running successfully (no CrashLoopBackOff) #### ✅ 2. Namespace Creation ```bash $ oc get namespace models-as-a-service NAME STATUS AGE models-as-a-service Active 92s ``` **Result**: Namespace auto-created by controller #### ✅ 3. RBAC Bindings Verification ```bash $ oc get clusterrolebinding maas-controller-rolebinding -o yaml | grep -A 5 "subjects:" subjects: - kind: ServiceAccount name: maas-controller namespace: redhat-ods-applications ``` **Result**: ClusterRoleBinding correctly references `redhat-ods-applications` namespace ```bash $ oc get rolebinding -n redhat-ods-applications maas-controller-leader-election-rolebinding -o yaml | grep -A 5 "subjects:" subjects: - kind: ServiceAccount name: maas-controller namespace: redhat-ods-applications ``` **Result**: RoleBinding correctly references `redhat-ods-applications` namespace #### ✅ 4. RBAC Permissions Validation ```bash $ oc auth can-i get namespaces --as=system:serviceaccount:redhat-ods-applications:maas-controller yes $ oc auth can-i list namespaces --as=system:serviceaccount:redhat-ods-applications:maas-controller yes $ oc auth can-i create namespaces --as=system:serviceaccount:redhat-ods-applications:maas-controller yes ``` **Result**: All required namespace permissions granted #### ✅ 5. Controller Logs Verification ```bash $ oc logs -n redhat-ods-applications deployment/maas-controller --tail=10 ``` **Key log entries**: ```json {"level":"info","msg":"subscription namespace not found, attempting to create it","namespace":"models-as-a-service"} {"level":"info","msg":"subscription namespace ready","namespace":"models-as-a-service"} {"level":"info","msg":"watching namespace for MaaS AuthPolicy and MaaSSubscription","namespace":"models-as-a-service"} {"level":"info","msg":"starting manager"} {"level":"info","msg":"Starting Controller","controller":"maasmodelref"} {"level":"info","msg":"Starting Controller","controller":"maassubscription"} {"level":"info","msg":"Starting Controller","controller":"maasauthpolicy"} ``` **Result**: - Namespace creation logic executed successfully - All controllers started without errors - No Forbidden errors in logs #### ✅ 6. DataScienceCluster Status ```bash $ oc get datasciencecluster default-dsc -o jsonpath='{.status.conditions[?(@.type=="ModelsAsServiceReady")]}' ``` **Output**: ```json { "lastTransitionTime": "2026-03-26T17:08:48Z", "status": "True", "type": "ModelsAsServiceReady" } ``` **Result**: ModelsAsServiceReady condition = True ```bash $ oc get datasciencecluster default-dsc -o jsonpath='{.status.conditions[?(@.type=="Ready")]}' ``` **Output**: ```json { "lastTransitionTime": "2026-03-26T17:08:48Z", "status": "True", "type": "Ready" } ``` **Result**: Overall DSC Ready condition = True #### ✅ 7. Component Deployment Verification ```bash $ oc get deployment -n redhat-ods-applications NAME READY UP-TO-DATE AVAILABLE AGE maas-api 1/1 1 1 5m maas-controller 1/1 1 1 5m postgres 1/1 1 1 5m ``` **Result**: All MaaS components deployed and ready #### ✅ 8. ClusterRole Permissions Inspection ```bash $ oc get clusterrole maas-controller-role -o yaml ``` **Namespace permissions**: ```yaml - apiGroups: - "" resources: - namespaces verbs: - create - get - list - watch ``` **Result**: All required verbs present for namespace operations ### Regression Testing #### ✅ Standalone Deployment (opendatahub namespace) The fix maintains backward compatibility with standalone deployments using the `opendatahub` namespace: **Kustomize validation**: ```bash $ cd deployment/overlays/odh $ cat params.env app-namespace=opendatahub ... $ kustomize build . | grep -A 5 "kind: ClusterRoleBinding" kind: ClusterRoleBinding metadata: name: maas-controller-rolebinding subjects: - kind: ServiceAccount name: maas-controller namespace: opendatahub ✅ ``` **Result**: Standalone deployments unaffected ### Code Quality Checks #### ✅ Error Handling The improved namespace creation logic includes: - Pre-check: Verify namespace existence before attempting creation - Permanent error detection: Forbidden errors are not retried - Clear error messages: `"service account lacks permission to create namespace %q — either pre-create the namespace or grant 'create' on namespaces"` #### ✅ Graceful Degradation - If namespace exists (pre-created by operator): Controller proceeds without creation attempt - If namespace doesn't exist and SA has permissions: Controller creates it - If namespace doesn't exist and SA lacks permissions: Controller fails with clear actionable error ### Performance Impact - **Startup time**: No noticeable impact - **Resource usage**: No change - **Network calls**: +1 GET call to check namespace existence (before create attempt) ### Summary | Test Case | Expected | Actual | Status | |-----------|----------|--------|--------| | Controller pod status | Running 1/1 | Running 1/1 | ✅ PASS | | Namespace auto-creation | Created | Created | ✅ PASS | | ClusterRoleBinding namespace | redhat-ods-applications | redhat-ods-applications | ✅ PASS | | RoleBinding namespace | redhat-ods-applications | redhat-ods-applications | ✅ PASS | | RBAC permissions | get, list, create namespaces | get, list, create namespaces | ✅ PASS | | Controller logs | No Forbidden errors | No Forbidden errors | ✅ PASS | | DSC ModelsAsServiceReady | True | True | ✅ PASS | | DSC Overall Ready | True | True | ✅ PASS | | Backward compatibility | opendatahub works | opendatahub works | ✅ PASS | **Overall Result**: ✅ **ALL TESTS PASSED** ### Deployment Timeline ``` 17:05:00 - Deployment started (RHOAI operator installation) 17:07:00 - RHCL operator ready 17:08:00 - RHOAI operator ready 17:08:48 - DSC applied, MaaS controller starting 17:09:05 - maas-controller pod running 17:09:05 - models-as-a-service namespace created 17:09:05 - All reconcilers started 17:10:00 - Full deployment validated Total deployment time: ~5 minutes ``` ## Merge criteria: <!--- This PR will be merged by any repository approver when it meets all the points in the checklist --> <!--- Go over all the following points, and put an `x` in all the boxes that apply. --> - [x] The commits are squashed in a cohesive manner and have meaningful messages. - [x] Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious). - [x] The developer has manually tested the changes and verified that the changes work <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved namespace existence checking with enhanced error handling for permission failures. * Refined namespace creation retry logic to properly distinguish between recoverable and non-recoverable errors. * **Configuration** * Extended namespace configuration in deployment overlays to ensure proper namespace settings for role bindings. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Sonnet 4.5 <[email protected]>
1 parent 169213a commit 2c748c3

File tree

2 files changed

+43
-16
lines changed

2 files changed

+43
-16
lines changed

deployment/overlays/odh/kustomization.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,3 +141,13 @@ replacements:
141141
options:
142142
delimiter: "."
143143
index: 1
144+
- select:
145+
kind: ClusterRoleBinding
146+
name: maas-controller-rolebinding
147+
fieldPaths:
148+
- subjects.0.namespace
149+
- select:
150+
kind: RoleBinding
151+
name: maas-controller-leader-election-rolebinding
152+
fieldPaths:
153+
- subjects.0.namespace

maas-controller/cmd/manager/main.go

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ package main
1919
import (
2020
"context"
2121
"flag"
22+
"fmt"
2223
"os"
2324
"time"
2425

@@ -58,22 +59,33 @@ func init() {
5859
utilruntime.Must(maasv1alpha1.AddToScheme(scheme))
5960
}
6061

61-
// ensureSubscriptionNamespaceExists creates the subscription namespace if it doesn't exist.
62-
// This allows users to create MaaS CRs without manually creating the namespace.
63-
// It retries with exponential backoff to handle transient failures.
62+
// ensureSubscriptionNamespaceExists checks whether the subscription namespace exists
63+
// and creates it if missing. It checks for existence first so that the controller can
64+
// start even when the service account lacks namespace-create permission (common in
65+
// operator-managed deployments where the operator pre-creates the namespace).
66+
// Permanent errors such as Forbidden are not retried.
6467
func ensureSubscriptionNamespaceExists(ctx context.Context, namespace string) error {
68+
cfg := ctrl.GetConfigOrDie()
69+
clientset, err := kubernetes.NewForConfig(cfg)
70+
if err != nil {
71+
return fmt.Errorf("unable to create Kubernetes client: %w", err)
72+
}
73+
74+
_, err = clientset.CoreV1().Namespaces().Get(ctx, namespace, metav1.GetOptions{})
75+
if err == nil {
76+
setupLog.Info("subscription namespace already exists", "namespace", namespace)
77+
return nil
78+
}
79+
if !errors.IsNotFound(err) {
80+
return fmt.Errorf("unable to check if namespace %q exists: %w", namespace, err)
81+
}
82+
83+
setupLog.Info("subscription namespace not found, attempting to create it", "namespace", namespace)
6584
return wait.ExponentialBackoffWithContext(ctx, wait.Backoff{
6685
Steps: 5,
6786
Duration: 1 * time.Second,
6887
Factor: 2.0,
6988
}, func(ctx context.Context) (bool, error) {
70-
cfg := ctrl.GetConfigOrDie()
71-
clientset, err := kubernetes.NewForConfig(cfg)
72-
if err != nil {
73-
setupLog.Info("retrying namespace creation", "namespace", namespace, "error", err)
74-
return false, nil // retry
75-
}
76-
7789
ns := &corev1.Namespace{
7890
ObjectMeta: metav1.ObjectMeta{
7991
Name: namespace,
@@ -83,13 +95,18 @@ func ensureSubscriptionNamespaceExists(ctx context.Context, namespace string) er
8395
},
8496
}
8597

86-
_, err = clientset.CoreV1().Namespaces().Create(ctx, ns, metav1.CreateOptions{})
87-
if err != nil && !errors.IsAlreadyExists(err) {
88-
setupLog.Info("retrying namespace creation", "namespace", namespace, "error", err)
89-
return false, nil // retry
98+
_, err := clientset.CoreV1().Namespaces().Create(ctx, ns, metav1.CreateOptions{})
99+
if err == nil || errors.IsAlreadyExists(err) {
100+
setupLog.Info("subscription namespace ready", "namespace", namespace)
101+
return true, nil
102+
}
103+
if errors.IsForbidden(err) {
104+
return false, fmt.Errorf("service account lacks permission to create namespace %q — "+
105+
"either pre-create the namespace or grant 'create' on namespaces to the controller service account: %w",
106+
namespace, err)
90107
}
91-
setupLog.Info("subscription namespace ready", "namespace", namespace)
92-
return true, nil // success
108+
setupLog.Info("retrying namespace creation", "namespace", namespace, "error", err)
109+
return false, nil // transient error, retry
93110
})
94111
}
95112

0 commit comments

Comments
 (0)