-
Notifications
You must be signed in to change notification settings - Fork 125
✨ add contoller to support token infrastructure #1340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: elgnay The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/hold |
WalkthroughAdds token-based ServiceAccount RBAC and manifests, a new Token Infrastructure controller that renders/applies those manifests and manages readiness, comprehensive controller tests, integrates the controller into the addon manager, and broadens a chart ClusterRole permission for serviceaccounts/token. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1340 +/- ##
==========================================
+ Coverage 61.97% 62.21% +0.23%
==========================================
Files 217 219 +2
Lines 17347 17560 +213
==========================================
+ Hits 10751 10925 +174
- Misses 5458 5479 +21
- Partials 1138 1156 +18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@manifests/cluster-manager/hub/addon-manager/token-role.yaml`:
- Around line 1-13: The YAML templates use unquoted templated scalars which YAML
linters reject; update the Role manifest by quoting all template expressions so
they become valid scalars — e.g., wrap metadata.name, metadata.namespace, each
label value (addon.open-cluster-management.io/name and
addon.open-cluster-management.io/token-infrastructure if needed), and the
resourceNames entry in quotes (referencing the Role kind and the resourceNames
["{{ .ClusterName }}-{{ .AddonName }}-agent"] and metadata fields) to ensure the
template strings are treated as YAML strings and pass linting.
In `@manifests/cluster-manager/hub/addon-manager/token-rolebinding.yaml`:
- Around line 1-19: The templated scalar values in the RoleBinding resource need
to be quoted to avoid YAML parsing/lint errors; update metadata.name,
metadata.namespace, the label values (addon.open-cluster-management.io/name and
addon.open-cluster-management.io/token-infrastructure), the subjects[].name
entries (system:open-cluster-management:{{ .ClusterName }} and
open-cluster-management:{{ .ClusterName }}), and roleRef.name ({{ .ClusterName
}}-{{ .AddonName }}-token-role) to be wrapped in quotes so the templated strings
are treated as YAML scalars rather than unquoted tokens.
In `@manifests/cluster-manager/hub/addon-manager/token-serviceaccount.yaml`:
- Around line 1-8: YAML templated scalars are unquoted and triggering linter
errors; update the ServiceAccount resource by quoting all template
interpolations so they become valid YAML scalars — e.g., change metadata.name to
"{{ .ClusterName }}-{{ .AddonName }}-agent", metadata.namespace to "{{
.ClusterName }}", and the label value addon.open-cluster-management.io/name to
"{{ .AddonName }}"; keep or ensure
addon.open-cluster-management.io/token-infrastructure remains a quoted string.
This fixes YAML parsing while preserving the templates (alternatively, add a
linter-ignore around the template block if quoting is not desired).
In `@pkg/addon/controllers/addontokeninfra/controller_test.go`:
- Around line 239-285: The current validateKubeActions closure asserts exact
action count and order which is brittle; instead iterate over the actions slice
inside validateKubeActions and count actions whose GetVerb() == "delete" (like
the earlier test does), then assert that the deleteCount equals 3 and avoid
failing on extra GETs or other verbs; update the closure that references
expectedActions/len(actions) to perform this delete-verb count check so
RoleBinding/Role/ServiceAccount deletions are verified robustly.
In `@pkg/addon/controllers/addontokeninfra/controller.go`:
- Around line 212-216: When manifest application produces errs in
ensureTokenInfrastructure, create the aggregate (utilerrors.NewAggregate(errs)),
call c.updateCondition(...) to persist the failure condition, and then return
the aggregate apply error (not just the status update error). If updateCondition
itself fails, wrap or combine that error with the aggregate (e.g., return a
combined error that includes both the apply aggregate and the status update
error). Reference ensureTokenInfrastructure, errs, utilerrors.NewAggregate and
c.updateCondition when making the change.
- Around line 121-125: The IsNotFound branch for addon retrieval skips cleanup
and can leak ServiceAccount/Role/RoleBinding tokens when ManagedClusterAddOn
lacks a finalizer; either ensure a finalizer is added to ManagedClusterAddOn so
the DeletionTimestamp path (the existing cleanup code) runs, or add best-effort
cleanup in the IsNotFound branch by invoking the same cleanup routine used in
the DeletionTimestamp flow (e.g., call the controller's cleanup function that
removes the ServiceAccount, Role, RoleBinding and token secrets for the given
clusterName/addonName), making the IsNotFound path attempt idempotent removal of
those resources before returning nil.
b6cfe49 to
dc6d55d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@manifests/cluster-manager/hub/addon-manager/clusterrole.yaml`:
- Around line 71-78: The addon-manager ClusterRole currently includes a
cluster-wide permission to create "serviceaccounts/token" which is unnecessary;
remove the entire apiGroups/resources/verbs block that grants create on
"serviceaccounts/token" from the ClusterRole in clusterrole.yaml so only
namespaced token creation remains controlled by the per-addon Role (see
token-role.yaml with its resourceNames restrictions) and the
tokenInfrastructureController that applies those manifests; keep the generic
"serviceaccounts" verbs for managing accounts but delete the separate
"serviceaccounts/token" create permission to restore least-privilege.
♻️ Duplicate comments (1)
pkg/addon/controllers/addontokeninfra/controller.go (1)
121-125: Addon deletion can skip cleanup when the object is already gone.Line 121-125 returns nil on NotFound assuming namespace cleanup. Managed cluster namespaces often persist, so without a finalizer the ServiceAccount/Role/RoleBinding can be orphaned. Either ensure a finalizer is added elsewhere or do best‑effort cleanup in the NotFound branch.
🧹 Best‑effort cleanup on NotFound
if errors.IsNotFound(err) { - // Addon is deleted, no cleanup needed (resources in the namespace will be cleaned up) - return nil + // Best-effort cleanup; managed cluster namespaces often persist + if cleanupErr := c.cleanupTokenInfrastructure(ctx, clusterName, addonName); cleanupErr != nil { + return cleanupErr + } + return nil }#!/bin/bash # Verify whether a ManagedClusterAddOn finalizer is added elsewhere rg -n --type=go -C3 'ManagedClusterAddOn.*finalizer|finalizer.*ManagedClusterAddOn|AddFinalizer|RemoveFinalizer' pkg/addon
| # Allow controller to manage serviceaccounts for token-based authentication | ||
| - apiGroups: [""] | ||
| resources: ["serviceaccounts"] | ||
| verbs: ["get", "list", "watch", "create", "update", "delete"] | ||
| # Allow controller to create RoleBindings that grant serviceaccounts/token permission | ||
| - apiGroups: [""] | ||
| resources: ["serviceaccounts/token"] | ||
| verbs: ["create"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for TokenRequest, serviceaccounts/token, or CreateToken calls in the codebase
rg -n --type=go -i 'tokenrequest|serviceaccounts/token|createtoken' . --max-count=50Repository: open-cluster-management-io/ocm
Length of output: 50386
🏁 Script executed:
#!/bin/bash
# Look for the addon-manager implementation and understand its token handling
fd -e "go" -e "yaml" -e "yml" | xargs grep -l "addon-manager" | head -20Repository: open-cluster-management-io/ocm
Length of output: 1530
🏁 Script executed:
#!/bin/bash
# Check the full RBAC file to understand context around this rule
cat -n manifests/cluster-manager/hub/addon-manager/clusterrole.yaml | grep -A 20 -B 5 "serviceaccounts/token"Repository: open-cluster-management-io/ocm
Length of output: 554
🏁 Script executed:
#!/bin/bash
# Search non-vendored addon-manager code for token-related operations
find pkg -name "*.go" -type f | xargs grep -l "token\|Token" | head -20Repository: open-cluster-management-io/ocm
Length of output: 1171
🏁 Script executed:
#!/bin/bash
# Look for actual addon-manager controller implementation and token handling
rg -n "addon.*manager\|addonmanager" pkg -i --type=go -A 3 | grep -i "token\|createtoken" | head -30Repository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Check if addon-manager actually creates service account tokens or just manages serviceaccounts
rg -n "CreateToken\|Token.*Request\|serviceaccounts" pkg/addons -i --type=go 2>/dev/null | head -30Repository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Check the addontokeninfra controller implementation
cat -n pkg/addon/controllers/addontokeninfra/controller.go | head -100Repository: open-cluster-management-io/ocm
Length of output: 4253
🏁 Script executed:
#!/bin/bash
# Search for CreateToken calls in the entire non-vendor codebase
rg -n "CreateToken" pkg -A 3 --type=goRepository: open-cluster-management-io/ocm
Length of output: 730
🏁 Script executed:
#!/bin/bash
# Check manifest files for any token-related resources the addon-manager handles
rg -n "serviceaccounts/token\|TokenRequest" manifests -iRepository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Check what calls sa_syncer and CreateToken in the addon-manager context
rg -n "sa_syncer\|CreateToken" pkg/addon pkg/operator/operators/clustermanager --type=go -B 2 -A 2Repository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Read the sa_syncer.go to understand its purpose and who uses it
cat -n pkg/operator/helpers/sa_syncer.go | head -150Repository: open-cluster-management-io/ocm
Length of output: 5497
🏁 Script executed:
#!/bin/bash
# Check if addon-manager specifically uses the sa_syncer or CreateToken functions
rg -n "NewSA\|sa_syncer" pkg/addon --type=goRepository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Check if addon-manager uses SATokenGetter or the sa_syncer helpers
rg -n "SATokenGetter\|SyncKubeConfigSecret" pkg/addon --type=go -B 3 -A 3Repository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Look for imports of sa_syncer helpers in the addon-manager
rg -n "pkg/operator/helpers" pkg/addon --type=goRepository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Check the addon manager implementation to understand what permissions it needs
cat -n pkg/addon/manager.go | head -200Repository: open-cluster-management-io/ocm
Length of output: 8758
🏁 Script executed:
#!/bin/bash
# Continue reading pkg/addon/manager.go to see what happens with tokenInfrastructureController
cat -n pkg/addon/manager.go | sed -n '198,250p'Repository: open-cluster-management-io/ocm
Length of output: 1076
🏁 Script executed:
#!/bin/bash
# Check the comment about what token-based authentication infrastructure means
cat -n manifests/cluster-manager/hub/addon-manager/token-role.yamlRepository: open-cluster-management-io/ocm
Length of output: 602
🏁 Script executed:
#!/bin/bash
# Look for what actual operations the addon-manager performs with serviceaccounts/token
rg -n "addontokeninfra" pkg/addon/manager.go -A 10 -B 5Repository: open-cluster-management-io/ocm
Length of output: 1535
🏁 Script executed:
#!/bin/bash
# Verify this is the addon-manager's ClusterRole and check the full context
cat -n manifests/cluster-manager/hub/addon-manager/clusterrole.yaml | head -90Repository: open-cluster-management-io/ocm
Length of output: 3902
🏁 Script executed:
#!/bin/bash
# Check if there are other roles/clusterroles for addon-manager to understand the pattern
ls -la manifests/cluster-manager/hub/addon-manager/Repository: open-cluster-management-io/ocm
Length of output: 845
🏁 Script executed:
#!/bin/bash
# Search for where the addon-manager actually creates tokens to understand the difference
rg -n "resourceNames\|CreateToken" manifests/cluster-manager/hub/addon-manager/ -A 2 -B 2Repository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Check if addon-manager itself creates tokens, or if the per-addon token-role handles it
cat -n manifests/cluster-manager/hub/addon-manager/token-rolebinding.yamlRepository: open-cluster-management-io/ocm
Length of output: 855
🏁 Script executed:
#!/bin/bash
# Look for who uses SATokenGetter to understand if addon-manager itself needs the permission
rg -n "SATokenGetter" pkg --type=go -B 5 -A 10Repository: open-cluster-management-io/ocm
Length of output: 8082
🏁 Script executed:
#!/bin/bash
# Check if there's documentation or comments explaining why the cluster-wide token permission is needed
rg -n "serviceaccounts/token\|create.*token" manifests/cluster-manager/hub/addon-manager/ -B 3 -A 3Repository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Check if addon-manager controller code actually creates tokens anywhere
rg -n "CreateToken\|tokenGetter" pkg/addon/controllers/addonmanagement --type=go -A 5 -B 2Repository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Verify that addon-manager doesn't call SATokenGetter like clustermanager does
rg -n "CreateToken\|SATokenGetter" pkg/addon --type=goRepository: open-cluster-management-io/ocm
Length of output: 56
🏁 Script executed:
#!/bin/bash
# Double-check what the tokenInfrastructureController actually does - does it create tokens or just manifests?
cat -n pkg/addon/controllers/addontokeninfra/controller.go | sed -n '100,200p'Repository: open-cluster-management-io/ocm
Length of output: 4100
Remove the cluster-wide serviceaccounts/token create permission from addon-manager ClusterRole.
Lines 76-78 grant addon-manager unrestricted create permission on serviceaccounts/token across all namespaces. However, the addon-manager code does not call the TokenRequest API directly—it only applies manifests via the tokenInfrastructureController. The per-addon token infrastructure (defined in token-role.yaml) uses a scoped namespaced Role with resourceNames restrictions on specific service accounts (e.g., {{ .ClusterName }}-{{ .AddonName }}-agent). The cluster-wide permission in the addon-manager ClusterRole is unnecessary and violates least privilege; remove it entirely.
🔒 Suggested fix
-# Allow controller to create RoleBindings that grant serviceaccounts/token permission
-- apiGroups: [""]
- resources: ["serviceaccounts/token"]
- verbs: ["create"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Allow controller to manage serviceaccounts for token-based authentication | |
| - apiGroups: [""] | |
| resources: ["serviceaccounts"] | |
| verbs: ["get", "list", "watch", "create", "update", "delete"] | |
| # Allow controller to create RoleBindings that grant serviceaccounts/token permission | |
| - apiGroups: [""] | |
| resources: ["serviceaccounts/token"] | |
| verbs: ["create"] | |
| # Allow controller to manage serviceaccounts for token-based authentication | |
| - apiGroups: [""] | |
| resources: ["serviceaccounts"] | |
| verbs: ["get", "list", "watch", "create", "update", "delete"] |
🤖 Prompt for AI Agents
In `@manifests/cluster-manager/hub/addon-manager/clusterrole.yaml` around lines 71
- 78, The addon-manager ClusterRole currently includes a cluster-wide permission
to create "serviceaccounts/token" which is unnecessary; remove the entire
apiGroups/resources/verbs block that grants create on "serviceaccounts/token"
from the ClusterRole in clusterrole.yaml so only namespaced token creation
remains controlled by the per-addon Role (see token-role.yaml with its
resourceNames restrictions) and the tokenInfrastructureController that applies
those manifests; keep the generic "serviceaccounts" verbs for managing accounts
but delete the separate "serviceaccounts/token" create permission to restore
least-privilege.
| err = c.ensureTokenInfrastructure(ctx, addon, clusterName, addonName) | ||
| if err == nil { | ||
| // Queue the addon for reconciliation after TokenInfrastructureResyncInterval | ||
| syncCtx.Queue().AddAfter(key, TokenInfrastructureResyncInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need resync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sa, role or rolebinding may be removed by accident. Resync is needed to ensure they are recreated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we just watch instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the controller creates service accounts, roles, and role bindings, it needs to watch all of them in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is fine
| } | ||
| // Extract ServiceAccount UID | ||
| if sa, ok := result.Result.(*corev1.ServiceAccount); ok { | ||
| saUID = string(sa.UID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to record uid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once an sa is removed, its tokens become invalid. The uid is used to verify whether a token hold by agent is still valid.
Signed-off-by: Yang Le <[email protected]>
Summary
Related issue(s)
Fixes #
Summary by CodeRabbit
New Features
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.