Skip to content

Conversation

@elgnay
Copy link
Contributor

@elgnay elgnay commented Jan 19, 2026

Summary

Related issue(s)

Fixes #

Summary by CodeRabbit

  • New Features

    • Token-based authentication infrastructure for cluster add-ons: automatic provisioning and reconciliation of token service account, role, and rolebinding resources via a new controller.
  • Chores

    • Expanded RBAC to support token workflows, including explicit permissions to manage serviceaccounts and serviceaccount tokens; controller started at runtime.
  • Tests

    • Added comprehensive tests covering provisioning, updates, cleanup, and deletion scenarios for the token infrastructure.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 19, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: elgnay
Once this PR has been reviewed and has the lgtm label, please assign jnpacker for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@elgnay
Copy link
Contributor Author

elgnay commented Jan 19, 2026

/hold

@coderabbitai
Copy link

coderabbitai bot commented Jan 19, 2026

Walkthrough

Adds token-based ServiceAccount RBAC and manifests, a new Token Infrastructure controller that renders/applies those manifests and manages readiness, comprehensive controller tests, integrates the controller into the addon manager, and broadens a chart ClusterRole permission for serviceaccounts/token.

Changes

Cohort / File(s) Summary
RBAC & token manifests
manifests/cluster-manager/hub/addon-manager/clusterrole.yaml, manifests/cluster-manager/hub/addon-manager/token-role.yaml, manifests/cluster-manager/hub/addon-manager/token-rolebinding.yaml, manifests/cluster-manager/hub/addon-manager/token-serviceaccount.yaml
Expanded ClusterRole rules to include roles and added rules for serviceaccounts and serviceaccounts/token; added templated Role, RoleBinding, and ServiceAccount manifests scoped to the cluster/addon for token infrastructure.
Token Infrastructure controller
pkg/addon/controllers/addontokeninfra/controller.go
New controller (TokenInfraConfig, TokenInfrastructureReadyCondition, resync) that renders/applies token manifests, captures SA UID, updates readiness condition, and performs cleanup when unused or on deletion.
Controller tests
pkg/addon/controllers/addontokeninfra/controller_test.go
New tests covering addon filtering, creation, update, deletion, and cleanup scenarios using fake Kubernetes/addon clients and informers; verifies Resource creations/deletions and TokenInfrastructureReady condition mutations.
Manager integration
pkg/addon/manager.go
Registers and starts the TokenInfrastructureController alongside existing addon controllers.
Chart template change
deploy/cluster-manager/chart/cluster-manager/templates/cluster_role.yaml
Removed resourceNames constraint from serviceaccounts/token rule, widening allowed token creation access.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • zhiweiyin318
  • haoqing0110
🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description contains only the unfilled template with placeholder sections for Summary and Related issue(s), providing no meaningful information about the changes or context. Complete the Summary section describing what the token infrastructure controller does, why it's needed, and how it works. Fill in the Related issue(s) section with any relevant issue numbers.
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change—adding a controller for token infrastructure—and matches the primary objective of the changeset, though it contains a typo ('contoller').

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 84.24658% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.21%. Comparing base (d165060) to head (df46ffb).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...kg/addon/controllers/addontokeninfra/controller.go 83.57% 13 Missing and 10 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1340      +/-   ##
==========================================
+ Coverage   61.97%   62.21%   +0.23%     
==========================================
  Files         217      219       +2     
  Lines       17347    17560     +213     
==========================================
+ Hits        10751    10925     +174     
- Misses       5458     5479      +21     
- Partials     1138     1156      +18     
Flag Coverage Δ
unit 62.21% <84.24%> (+0.23%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@manifests/cluster-manager/hub/addon-manager/token-role.yaml`:
- Around line 1-13: The YAML templates use unquoted templated scalars which YAML
linters reject; update the Role manifest by quoting all template expressions so
they become valid scalars — e.g., wrap metadata.name, metadata.namespace, each
label value (addon.open-cluster-management.io/name and
addon.open-cluster-management.io/token-infrastructure if needed), and the
resourceNames entry in quotes (referencing the Role kind and the resourceNames
["{{ .ClusterName }}-{{ .AddonName }}-agent"] and metadata fields) to ensure the
template strings are treated as YAML strings and pass linting.

In `@manifests/cluster-manager/hub/addon-manager/token-rolebinding.yaml`:
- Around line 1-19: The templated scalar values in the RoleBinding resource need
to be quoted to avoid YAML parsing/lint errors; update metadata.name,
metadata.namespace, the label values (addon.open-cluster-management.io/name and
addon.open-cluster-management.io/token-infrastructure), the subjects[].name
entries (system:open-cluster-management:{{ .ClusterName }} and
open-cluster-management:{{ .ClusterName }}), and roleRef.name ({{ .ClusterName
}}-{{ .AddonName }}-token-role) to be wrapped in quotes so the templated strings
are treated as YAML scalars rather than unquoted tokens.

In `@manifests/cluster-manager/hub/addon-manager/token-serviceaccount.yaml`:
- Around line 1-8: YAML templated scalars are unquoted and triggering linter
errors; update the ServiceAccount resource by quoting all template
interpolations so they become valid YAML scalars — e.g., change metadata.name to
"{{ .ClusterName }}-{{ .AddonName }}-agent", metadata.namespace to "{{
.ClusterName }}", and the label value addon.open-cluster-management.io/name to
"{{ .AddonName }}"; keep or ensure
addon.open-cluster-management.io/token-infrastructure remains a quoted string.
This fixes YAML parsing while preserving the templates (alternatively, add a
linter-ignore around the template block if quoting is not desired).

In `@pkg/addon/controllers/addontokeninfra/controller_test.go`:
- Around line 239-285: The current validateKubeActions closure asserts exact
action count and order which is brittle; instead iterate over the actions slice
inside validateKubeActions and count actions whose GetVerb() == "delete" (like
the earlier test does), then assert that the deleteCount equals 3 and avoid
failing on extra GETs or other verbs; update the closure that references
expectedActions/len(actions) to perform this delete-verb count check so
RoleBinding/Role/ServiceAccount deletions are verified robustly.

In `@pkg/addon/controllers/addontokeninfra/controller.go`:
- Around line 212-216: When manifest application produces errs in
ensureTokenInfrastructure, create the aggregate (utilerrors.NewAggregate(errs)),
call c.updateCondition(...) to persist the failure condition, and then return
the aggregate apply error (not just the status update error). If updateCondition
itself fails, wrap or combine that error with the aggregate (e.g., return a
combined error that includes both the apply aggregate and the status update
error). Reference ensureTokenInfrastructure, errs, utilerrors.NewAggregate and
c.updateCondition when making the change.
- Around line 121-125: The IsNotFound branch for addon retrieval skips cleanup
and can leak ServiceAccount/Role/RoleBinding tokens when ManagedClusterAddOn
lacks a finalizer; either ensure a finalizer is added to ManagedClusterAddOn so
the DeletionTimestamp path (the existing cleanup code) runs, or add best-effort
cleanup in the IsNotFound branch by invoking the same cleanup routine used in
the DeletionTimestamp flow (e.g., call the controller's cleanup function that
removes the ServiceAccount, Role, RoleBinding and token secrets for the given
clusterName/addonName), making the IsNotFound path attempt idempotent removal of
those resources before returning nil.

@elgnay elgnay closed this Jan 19, 2026
@elgnay elgnay reopened this Jan 19, 2026
@elgnay elgnay force-pushed the token-infra branch 2 times, most recently from b6cfe49 to dc6d55d Compare January 20, 2026 02:06
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@manifests/cluster-manager/hub/addon-manager/clusterrole.yaml`:
- Around line 71-78: The addon-manager ClusterRole currently includes a
cluster-wide permission to create "serviceaccounts/token" which is unnecessary;
remove the entire apiGroups/resources/verbs block that grants create on
"serviceaccounts/token" from the ClusterRole in clusterrole.yaml so only
namespaced token creation remains controlled by the per-addon Role (see
token-role.yaml with its resourceNames restrictions) and the
tokenInfrastructureController that applies those manifests; keep the generic
"serviceaccounts" verbs for managing accounts but delete the separate
"serviceaccounts/token" create permission to restore least-privilege.
♻️ Duplicate comments (1)
pkg/addon/controllers/addontokeninfra/controller.go (1)

121-125: Addon deletion can skip cleanup when the object is already gone.

Line 121-125 returns nil on NotFound assuming namespace cleanup. Managed cluster namespaces often persist, so without a finalizer the ServiceAccount/Role/RoleBinding can be orphaned. Either ensure a finalizer is added elsewhere or do best‑effort cleanup in the NotFound branch.

🧹 Best‑effort cleanup on NotFound
 if errors.IsNotFound(err) {
-	// Addon is deleted, no cleanup needed (resources in the namespace will be cleaned up)
-	return nil
+	// Best-effort cleanup; managed cluster namespaces often persist
+	if cleanupErr := c.cleanupTokenInfrastructure(ctx, clusterName, addonName); cleanupErr != nil {
+		return cleanupErr
+	}
+	return nil
 }
#!/bin/bash
# Verify whether a ManagedClusterAddOn finalizer is added elsewhere
rg -n --type=go -C3 'ManagedClusterAddOn.*finalizer|finalizer.*ManagedClusterAddOn|AddFinalizer|RemoveFinalizer' pkg/addon

Comment on lines +71 to +78
# Allow controller to manage serviceaccounts for token-based authentication
- apiGroups: [""]
resources: ["serviceaccounts"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
# Allow controller to create RoleBindings that grant serviceaccounts/token permission
- apiGroups: [""]
resources: ["serviceaccounts/token"]
verbs: ["create"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for TokenRequest, serviceaccounts/token, or CreateToken calls in the codebase
rg -n --type=go -i 'tokenrequest|serviceaccounts/token|createtoken' . --max-count=50

Repository: open-cluster-management-io/ocm

Length of output: 50386


🏁 Script executed:

#!/bin/bash
# Look for the addon-manager implementation and understand its token handling
fd -e "go" -e "yaml" -e "yml" | xargs grep -l "addon-manager" | head -20

Repository: open-cluster-management-io/ocm

Length of output: 1530


🏁 Script executed:

#!/bin/bash
# Check the full RBAC file to understand context around this rule
cat -n manifests/cluster-manager/hub/addon-manager/clusterrole.yaml | grep -A 20 -B 5 "serviceaccounts/token"

Repository: open-cluster-management-io/ocm

Length of output: 554


🏁 Script executed:

#!/bin/bash
# Search non-vendored addon-manager code for token-related operations
find pkg -name "*.go" -type f | xargs grep -l "token\|Token" | head -20

Repository: open-cluster-management-io/ocm

Length of output: 1171


🏁 Script executed:

#!/bin/bash
# Look for actual addon-manager controller implementation and token handling
rg -n "addon.*manager\|addonmanager" pkg -i --type=go -A 3 | grep -i "token\|createtoken" | head -30

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Check if addon-manager actually creates service account tokens or just manages serviceaccounts
rg -n "CreateToken\|Token.*Request\|serviceaccounts" pkg/addons -i --type=go 2>/dev/null | head -30

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Check the addontokeninfra controller implementation
cat -n pkg/addon/controllers/addontokeninfra/controller.go | head -100

Repository: open-cluster-management-io/ocm

Length of output: 4253


🏁 Script executed:

#!/bin/bash
# Search for CreateToken calls in the entire non-vendor codebase
rg -n "CreateToken" pkg -A 3 --type=go

Repository: open-cluster-management-io/ocm

Length of output: 730


🏁 Script executed:

#!/bin/bash
# Check manifest files for any token-related resources the addon-manager handles
rg -n "serviceaccounts/token\|TokenRequest" manifests -i

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Check what calls sa_syncer and CreateToken in the addon-manager context
rg -n "sa_syncer\|CreateToken" pkg/addon pkg/operator/operators/clustermanager --type=go -B 2 -A 2

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Read the sa_syncer.go to understand its purpose and who uses it
cat -n pkg/operator/helpers/sa_syncer.go | head -150

Repository: open-cluster-management-io/ocm

Length of output: 5497


🏁 Script executed:

#!/bin/bash
# Check if addon-manager specifically uses the sa_syncer or CreateToken functions
rg -n "NewSA\|sa_syncer" pkg/addon --type=go

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Check if addon-manager uses SATokenGetter or the sa_syncer helpers
rg -n "SATokenGetter\|SyncKubeConfigSecret" pkg/addon --type=go -B 3 -A 3

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Look for imports of sa_syncer helpers in the addon-manager
rg -n "pkg/operator/helpers" pkg/addon --type=go

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Check the addon manager implementation to understand what permissions it needs
cat -n pkg/addon/manager.go | head -200

Repository: open-cluster-management-io/ocm

Length of output: 8758


🏁 Script executed:

#!/bin/bash
# Continue reading pkg/addon/manager.go to see what happens with tokenInfrastructureController
cat -n pkg/addon/manager.go | sed -n '198,250p'

Repository: open-cluster-management-io/ocm

Length of output: 1076


🏁 Script executed:

#!/bin/bash
# Check the comment about what token-based authentication infrastructure means
cat -n manifests/cluster-manager/hub/addon-manager/token-role.yaml

Repository: open-cluster-management-io/ocm

Length of output: 602


🏁 Script executed:

#!/bin/bash
# Look for what actual operations the addon-manager performs with serviceaccounts/token
rg -n "addontokeninfra" pkg/addon/manager.go -A 10 -B 5

Repository: open-cluster-management-io/ocm

Length of output: 1535


🏁 Script executed:

#!/bin/bash
# Verify this is the addon-manager's ClusterRole and check the full context
cat -n manifests/cluster-manager/hub/addon-manager/clusterrole.yaml | head -90

Repository: open-cluster-management-io/ocm

Length of output: 3902


🏁 Script executed:

#!/bin/bash
# Check if there are other roles/clusterroles for addon-manager to understand the pattern
ls -la manifests/cluster-manager/hub/addon-manager/

Repository: open-cluster-management-io/ocm

Length of output: 845


🏁 Script executed:

#!/bin/bash
# Search for where the addon-manager actually creates tokens to understand the difference
rg -n "resourceNames\|CreateToken" manifests/cluster-manager/hub/addon-manager/ -A 2 -B 2

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Check if addon-manager itself creates tokens, or if the per-addon token-role handles it
cat -n manifests/cluster-manager/hub/addon-manager/token-rolebinding.yaml

Repository: open-cluster-management-io/ocm

Length of output: 855


🏁 Script executed:

#!/bin/bash
# Look for who uses SATokenGetter to understand if addon-manager itself needs the permission
rg -n "SATokenGetter" pkg --type=go -B 5 -A 10

Repository: open-cluster-management-io/ocm

Length of output: 8082


🏁 Script executed:

#!/bin/bash
# Check if there's documentation or comments explaining why the cluster-wide token permission is needed
rg -n "serviceaccounts/token\|create.*token" manifests/cluster-manager/hub/addon-manager/ -B 3 -A 3

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Check if addon-manager controller code actually creates tokens anywhere
rg -n "CreateToken\|tokenGetter" pkg/addon/controllers/addonmanagement --type=go -A 5 -B 2

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Verify that addon-manager doesn't call SATokenGetter like clustermanager does
rg -n "CreateToken\|SATokenGetter" pkg/addon --type=go

Repository: open-cluster-management-io/ocm

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Double-check what the tokenInfrastructureController actually does - does it create tokens or just manifests?
cat -n pkg/addon/controllers/addontokeninfra/controller.go | sed -n '100,200p'

Repository: open-cluster-management-io/ocm

Length of output: 4100


Remove the cluster-wide serviceaccounts/token create permission from addon-manager ClusterRole.

Lines 76-78 grant addon-manager unrestricted create permission on serviceaccounts/token across all namespaces. However, the addon-manager code does not call the TokenRequest API directly—it only applies manifests via the tokenInfrastructureController. The per-addon token infrastructure (defined in token-role.yaml) uses a scoped namespaced Role with resourceNames restrictions on specific service accounts (e.g., {{ .ClusterName }}-{{ .AddonName }}-agent). The cluster-wide permission in the addon-manager ClusterRole is unnecessary and violates least privilege; remove it entirely.

🔒 Suggested fix
-# Allow controller to create RoleBindings that grant serviceaccounts/token permission
-- apiGroups: [""]
-  resources: ["serviceaccounts/token"]
-  verbs: ["create"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Allow controller to manage serviceaccounts for token-based authentication
- apiGroups: [""]
resources: ["serviceaccounts"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
# Allow controller to create RoleBindings that grant serviceaccounts/token permission
- apiGroups: [""]
resources: ["serviceaccounts/token"]
verbs: ["create"]
# Allow controller to manage serviceaccounts for token-based authentication
- apiGroups: [""]
resources: ["serviceaccounts"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
🤖 Prompt for AI Agents
In `@manifests/cluster-manager/hub/addon-manager/clusterrole.yaml` around lines 71
- 78, The addon-manager ClusterRole currently includes a cluster-wide permission
to create "serviceaccounts/token" which is unnecessary; remove the entire
apiGroups/resources/verbs block that grants create on "serviceaccounts/token"
from the ClusterRole in clusterrole.yaml so only namespaced token creation
remains controlled by the per-addon Role (see token-role.yaml with its
resourceNames restrictions) and the tokenInfrastructureController that applies
those manifests; keep the generic "serviceaccounts" verbs for managing accounts
but delete the separate "serviceaccounts/token" create permission to restore
least-privilege.

err = c.ensureTokenInfrastructure(ctx, addon, clusterName, addonName)
if err == nil {
// Queue the addon for reconciliation after TokenInfrastructureResyncInterval
syncCtx.Queue().AddAfter(key, TokenInfrastructureResyncInterval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need resync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sa, role or rolebinding may be removed by accident. Resync is needed to ensure they are recreated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we just watch instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the controller creates service accounts, roles, and role bindings, it needs to watch all of them in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is fine

}
// Extract ServiceAccount UID
if sa, ok := result.Result.(*corev1.ServiceAccount); ok {
saUID = string(sa.UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to record uid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once an sa is removed, its tokens become invalid. The uid is used to verify whether a token hold by agent is still valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants