Implement ManagedServiceAccount resources reconciliation#130
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yxun The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 8 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/hub/mesh/controller.go (1)
153-161:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRBAC markers are missing permissions required by new endpoint-discovery lifecycle calls.
New reconcile paths perform
Get/List/Create/DeleteonManagedServiceAccountandDeleteonSecret, but the markers here do not grant those verbs/resources. This will fail withforbiddenin real deployments.At minimum, add kubebuilder RBAC for:
authentication.open-cluster-management.io/managedserviceaccounts:get;list;watch;create;update;patch;delete- core
secrets: includedelete(currently onlyget;list;watch)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/hub/mesh/controller.go` around lines 153 - 161, The RBAC markers in the controller are missing permissions required by new endpoint-discovery lifecycle calls. Add a new kubebuilder RBAC marker for the authentication.open-cluster-management.io API group with resource managedserviceaccounts and verbs get;list;watch;create;update;patch;delete to support the reconcile paths that perform Get, List, Create, and Delete operations on ManagedServiceAccount resources. Additionally, update the existing kubebuilder RBAC marker for core API group secrets resource to include the delete verb alongside the current get;list;watch verbs, since the new reconcile paths also delete Secrets.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/hub/mesh/endpoint_discovery.go`:
- Line 35: The MSA identity and selectors are scoped only by mesh name, creating
collision risks when meshes with the same name exist in different namespaces. At
the MSA name creation in the format string at line 35 (where msaName is
constructed), include the mesh namespace in addition to the mesh name to create
a namespace-qualified identity, either by directly incorporating mesh.Namespace
into the name or by hashing the (namespace, name) tuple. Apply the same
namespace-qualified identity change at lines 71-72 and 112-113 where other MSA
name references occur. Additionally, for all MSA list selectors that appear in
the code (around the affected lines), include MeshNamespaceLabel: mesh.Namespace
in the selector to ensure MSA queries are scoped to the specific mesh namespace
and prevent cross-mesh resource deletions during cleanup.
- Around line 37-40: The current error handling in the Get call for the
ManagedServiceAccount resource treats all errors as "not found" and proceeds to
create, which masks transient, RBAC, and API errors. Refactor the error handling
to explicitly check if the returned error is a NotFound error using the
appropriate error checking utility from the client-go library. If the error is
NotFound, proceed with the create logic; if the error is any other type, log the
error with context and return/fail fast rather than attempting to create. This
ensures transient and permission-related failures are properly surfaced instead
of being silently treated as non-existent resources.
- Around line 49-52: The `TokenValidity` field is optional in the API shape but
is being unconditionally dereferenced with the `*` operator when initializing
the `Rotation` struct in the `ManagedServiceAccountRotation` assignment. To
prevent reconciliation panics when the field is absent or nil, add a nil check
before dereferencing `mesh.Spec.Security.Discovery.TokenValidity` and provide a
sensible default duration value when the pointer is nil. Use a conditional
expression or helper function to select either the dereferenced value or the
default based on whether the pointer is nil.
In `@test/integration/controller_test.go`:
- Around line 657-678: Add explicit timeout and polling interval parameters to
all Eventually and Consistently calls that interact with the cluster, as these
calls currently lack deterministic timing controls. For the Eventually call in
the "should process a ManagedServiceAccount" test block (which retrieves
ManagedServiceAccount via k8sClient.Get), add timeout and polling interval
parameters following the pattern Eventually(func() error { ... },
30*time.Second, 1*time.Second).Should(Succeed()). Apply this same pattern to all
other Eventually and Consistently assertions throughout the test file that
interact with cluster operations to ensure test reliability and meet the coding
guidelines for async cluster operations.
---
Outside diff comments:
In `@pkg/hub/mesh/controller.go`:
- Around line 153-161: The RBAC markers in the controller are missing
permissions required by new endpoint-discovery lifecycle calls. Add a new
kubebuilder RBAC marker for the authentication.open-cluster-management.io API
group with resource managedserviceaccounts and verbs
get;list;watch;create;update;patch;delete to support the reconcile paths that
perform Get, List, Create, and Delete operations on ManagedServiceAccount
resources. Additionally, update the existing kubebuilder RBAC marker for core
API group secrets resource to include the delete verb alongside the current
get;list;watch verbs, since the new reconcile paths also delete Secrets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 17acd6b9-7cf0-47e1-8c8c-2ab62ec7d496
⛔ Files ignored due to path filters (1)
pkg/apis/mesh/v1alpha1/zz_generated.deepcopy.gois excluded by!**/zz_generated*
📒 Files selected for processing (4)
pkg/apis/mesh/v1alpha1/types.gopkg/hub/mesh/controller.gopkg/hub/mesh/endpoint_discovery.gotest/integration/controller_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/integration/controller_test.go`:
- Around line 725-728: The test currently only verifies deletion of
ManagedServiceAccounts for one mesh (meshName-istio-reader) in both clusters,
but the test scenario involves two meshes. To properly verify full multi-mesh
cleanup semantics, add two additional util.ExpectResourceDeleted calls after the
existing assertions that verify deletion of the otherMesh-istio-reader
ManagedServiceAccount in cluster1 and cluster2. Each new call should follow the
same pattern as the existing util.ExpectResourceDeleted invocations but use
fmt.Sprintf("%s-istio-reader", otherMesh) instead of meshName to ensure all
mesh-owned resources are properly cleaned up.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: dd038605-8d26-45d3-88ae-5012a76e617e
📒 Files selected for processing (1)
test/integration/controller_test.go
|
/hold |
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
|
/unhold |
jewertow
left a comment
There was a problem hiding this comment.
LGTM, but I assume you will address multi cluster secret creation in a follow-up PR, right?
@jewertow , yes, that was in the original PR and I will have some updates and address those in the next smaller PR. MSA CR creation, update |
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
sridhargaddam
left a comment
There was a problem hiding this comment.
@yxun, please take a look at the e2e test failures. Otherwise, LGTM.
|
/rerun-all |
|
/test golangci-lint |
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
Signed-off-by: Yuanlin Xu <yuanlin.xu@redhat.com>
|
The fix of e2e test error is in #157 It follows the following doc: |
|
|
||
| install_managed_serviceaccount() { | ||
| local hub_kubeconfig | ||
| hub_kubeconfig="$(echo "${DEV_KUBE_DIR}/${HUB}.config")" |
There was a problem hiding this comment.
Reason of this change:
There was a failure in the job of the previous commit:
https://github.com/stolostron/multicluster-mesh-addon/actions/runs/28129345197/job/83303594782
/home/runner/work/multicluster-mesh-addon/multicluster-mesh-addon/hack/dev-env.sh: line 273: kubeconfig_for: command not found
I could never reproduce this error from my local run. Still not clear why the github workflow job was not able to find that function name.
|
|
||
| install_managed_serviceaccount() { | ||
| local hub_kubeconfig | ||
| hub_kubeconfig="$(echo "${DEV_KUBE_DIR}/${HUB}.config")" |
|
|
||
| msaName := fmt.Sprintf("%s-%s-%s", mesh.Namespace, msaRootWord, mesh.Name) | ||
|
|
||
| var validity metav1.Duration |
There was a problem hiding this comment.
The kubebuilder default (+kubebuilder:default="360h" on TokenValidity) guarantees this field is always populated for objects created through the API.
This nil check and the hardcoded 360 * time.Hour are dead code with a hidden duplicate default.
Just use it directly.
| return reconcile.Result{}, fmt.Errorf("failed to cleanup ManifestWorks: %w", err) | ||
| } | ||
|
|
||
| klog.Infof("Handling deletion for ManagedServiceAccount resources managed by mesh %s/%s", mesh.Namespace, mesh.Name) |
There was a problem hiding this comment.
No need to log this, you already log each deleted MSA
| klog.Infof("Handling deletion for ManagedServiceAccount resources managed by mesh %s/%s", mesh.Namespace, mesh.Name) |
| expectMeshNotReady(otherMesh, otherNs) | ||
| util.DeleteResource(ctx, k8sClient, &meshv1alpha1.MultiClusterMesh{}, otherMesh, otherNs) | ||
| msa := &msav1beta1.ManagedServiceAccount{} | ||
| Expect(k8sClient.Get(ctx, types.NamespacedName{Name: fmt.Sprintf("%s-%s-%s", testNs, msaRootWord, meshName), Namespace: cluster1}, msa)).Should(Succeed()) |
There was a problem hiding this comment.
This is a point-in-time check that could pass before cleanup runs.
Use Consistently to verify MSAs remain present over a polling window, similar to other tests.
| if len(clusters) == 0 { | ||
| klog.V(4).Info("The ClusterSet has no managed cluster") | ||
| return nil | ||
| } |
There was a problem hiding this comment.
This check is inconsistent with the rest of the controller
| } | ||
|
|
||
| // Create ManagedServiceAccount resources for each cluster. | ||
| if err := r.createManagedServiceAccounts(ctx, mesh, clusters); err != nil { |
There was a problem hiding this comment.
Since this needs to run for each cluster, it makes more sense to put this inside the main for _, cluster := range clusters loop on L283 and handle each cluster with ensureEndpointDiscovery or some similar terminology.
| util.CreateMultiClusterMesh(ctx, k8sClient, otherMesh, otherNs, testClusterSet) | ||
| }) | ||
|
|
||
| It("should keep the ManagedServiceAccount when one mesh is deleted", func() { |
There was a problem hiding this comment.
These two tests use addon-owned (shared resource) semantics, but MSAs are mesh-owned.
Each mesh creates its own MSAs, so there's no shared resource to "keep."
"should keep the ManagedServiceAccount when one mesh is deleted" (L755): deletes otherMesh, checks meshName's MSAs survive, but never checks otherMesh's MSAs are deleted.
This is a tautology for mesh-owned resources.
"should delete the ManagedServiceAccount when both meshes are deleted" (L763): passes even with broken scoping since everything is deleted.
Replace both with one test that verifies correct scoping:
delete one mesh, assert its MSAs are gone AND the other mesh's MSAs survive.
For example:
It("should delete only the deleted mesh's MSAs", func() {
expectMeshNotReady(otherMesh, otherNs)
// Verify both meshes have MSAs
expectManagedServiceAccount(fmt.Sprintf("%s-%s-%s", testNs, msaRootWord, meshName), cluster1)
expectManagedServiceAccount(fmt.Sprintf("%s-%s-%s", otherNs, msaRootWord, otherMesh), cluster1)
// Delete one mesh
util.DeleteResource(ctx, k8sClient, &meshv1alpha1.MultiClusterMesh{}, otherMesh, otherNs)
// Deleted mesh's MSAs are gone
util.ExpectResourceDeleted(ctx, k8sClient, &msav1beta1.ManagedServiceAccount{},
fmt.Sprintf("%s-%s-%s", otherNs, msaRootWord, otherMesh), cluster1)
util.ExpectResourceDeleted(ctx, k8sClient, &msav1beta1.ManagedServiceAccount{},
fmt.Sprintf("%s-%s-%s", otherNs, msaRootWord, otherMesh), cluster2)
// Surviving mesh's MSAs remain
Expect(k8sClient.Get(ctx, types.NamespacedName{Name: fmt.Sprintf("%s-%s-%s", testNs, msaRootWord, meshName), Namespace: cluster1}, &msav1beta1.ManagedServiceAccount{})).To(Succeed())
Expect(k8sClient.Get(ctx, types.NamespacedName{Name: fmt.Sprintf("%s-%s-%s", testNs, msaRootWord, meshName), Namespace: cluster2}, &msav1beta1.ManagedServiceAccount{})).To(Succeed())
})| msa1 := expectManagedServiceAccount(fmt.Sprintf("%s-%s-%s", testNs, msaRootWord, meshName), cluster1) | ||
| msa2 := expectManagedServiceAccount(fmt.Sprintf("%s-%s-%s", testNs, msaRootWord, meshName), cluster2) |
There was a problem hiding this comment.
This is very repetitive, you can extract a simple func to format the MSA name:
func expectedMSAName(meshNamespace, meshName string) string {
return fmt.Sprintf("%s-istio-reader-%s", meshNamespace, meshName)
}Then you can use it inside expectManagedServiceAccount and in other places where you look for MSA
|
/hold |
This PR implements the creation and cleanup of the
ManagedServiceAccountresources for each managed cluster in theClusterSet.The goal of this PR is getting API access secret by creating a
ManagedServiceAccountcustom resource for each managed cluster. The distribution of those API access secrets or tokens will be implemented in the next PR.References:
This PR is the third part of the original large PR:
#55
The controller reconciliation need an update method for existing MSA. I created an issue in #120. That TODO item will be added in follow up PRs.
e2e test suite need new tests. I created an issue in #133 . That TODO item will be added in follow up PRs.