should fail when node SA has bucket access but pod KSA does not #1357
should fail when node SA has bucket access but pod KSA does not #1357tanuja-sunda73 wants to merge 5 commits into
Conversation
|
@tanuja-sunda73: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tanuja-sunda73 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @tanuja-sunda73. Thanks for your PR. I'm waiting for a GoogleCloudPlatform member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new E2E test suite for Workload Identity Federation (WIF) and refactors IAM member generation into a centralized helper method within the test driver. The reviewer identified several issues, including missing helper function definitions, the removal of the iamPropagationWaitTime constant, and potential risks with malformed IAM principal strings. Additionally, the reviewer provided actionable feedback to improve code quality, such as adopting idiomatic polling mechanisms instead of manual loops, using proper context handling for Ginkgo v2, replacing fragile project ID parsing, and reducing hardcoded sleep durations to improve test performance.
| projectNumber := getProjectNumber(projectID) | ||
| gomega.Expect(projectNumber).NotTo(gomega.BeEmpty(), "failed to get project number") | ||
|
|
||
| ginkgo.By(fmt.Sprintf("Creating workload identity pool: %s", poolID)) | ||
| createWorkloadIdentityPool(projectID, poolID) | ||
|
|
||
| clusterName := os.Getenv(utils.ClusterNameEnvVar) | ||
| clusterLocation := os.Getenv(utils.ClusterLocationEnvVar) | ||
| clusterIssuer := getClusterOIDCIssuer(clusterName, clusterLocation, projectID) | ||
| gomega.Expect(clusterIssuer).NotTo(gomega.BeEmpty(), "failed to discover cluster OIDC issuer") | ||
|
|
||
| ginkgo.By(fmt.Sprintf("Creating workload identity provider: %s", providerID)) | ||
| createWorkloadIdentityProvider(projectID, poolID, providerID, clusterIssuer) | ||
|
|
||
| ginkgo.By("Generating credential configuration") | ||
| credentialConfig := generateCredentialConfig(projectNumber, poolID, providerID) | ||
|
|
||
| ginkgo.By(fmt.Sprintf("Creating Kubernetes service account: %s", ksaName)) | ||
| createServiceAccount(ctx, f, ksaName) | ||
| ginkgo.DeferCleanup(func() { deleteServiceAccount(ctx, f, ksaName) }) | ||
|
|
||
| ginkgo.By(fmt.Sprintf("Creating credential ConfigMap: %s", configMapName)) | ||
| createCredentialConfigMap(ctx, f, configMapName, credentialConfig) | ||
| ginkgo.DeferCleanup(func() { deleteConfigMap(ctx, f, configMapName) }) |
There was a problem hiding this comment.
Several helper functions (e.g., getProjectNumber, createWorkloadIdentityPool, getClusterOIDCIssuer, createWorkloadIdentityProvider, generateCredentialConfig, createServiceAccount, deleteServiceAccount, createCredentialConfigMap, deleteConfigMap) are used but not defined in this file or package. If these are new helpers, they should be included in the PR.
There was a problem hiding this comment.
It looks like this PR is not yet ready to review?
| grantBucketAccess(bucketName, nodeSAPrincipal, "roles/storage.objectAdmin") | ||
| ginkgo.DeferCleanup(func() { revokeBucketAccess(bucketName, nodeSAPrincipal, "roles/storage.objectAdmin") }) |
| gcsfuseCSIProfilesStaticBucket = "gcsfusecsi-list-storm-hns-bucket" | ||
| gcsfuseCSIProfilesStaticBucketRegion = "us-central1" | ||
| gkeScalabilityImagesProjectID = "gke-scalability-images" | ||
| iamPropagationWaitTime = 10 * time.Minute |
There was a problem hiding this comment.
The constant iamPropagationWaitTime was removed and replaced with hardcoded 5 * time.Minute values at lines 159 and 172. It is better to maintain the constant for readability and to avoid magic numbers. If the reduction from 10 to 5 minutes is intentional, the constant's value should be updated instead of being removed.
| if wifPoolID := os.Getenv("OSS_WIF_POOL_ID"); wifPoolID != "" { | ||
| projectNumber := os.Getenv(utils.ProjectNumberEnvVar) | ||
| return fmt.Sprintf("principal://iam.googleapis.com/projects/%s/locations/global/workloadIdentityPools/%s/subject/system:serviceaccount:%s:%s", | ||
| projectNumber, wifPoolID, serviceAccountNamespace, serviceAccountName) | ||
| } |
There was a problem hiding this comment.
When OSS_WIF_POOL_ID is set, the code assumes utils.ProjectNumberEnvVar is also available. If it's missing, the IAM principal string will be malformed. Consider adding a check for projectNumber.
if wifPoolID := os.Getenv("OSS_WIF_POOL_ID"); wifPoolID != "" {
projectNumber := os.Getenv(utils.ProjectNumberEnvVar)
if projectNumber == "" {
e2eframework.Failf("Environment variable %v must be set when OSS_WIF_POOL_ID is used", utils.ProjectNumberEnvVar)
}
return fmt.Sprintf("principal://iam.googleapis.com/projects/%s/locations/global/workloadIdentityPools/%s/subject/system:serviceaccount:%s:%s",
projectNumber, wifPoolID, serviceAccountNamespace, serviceAccountName)
}| volumeResource *storageframework.VolumeResource | ||
| } | ||
| var l local | ||
| ctx := context.Background() |
| lines := strings.Split(strings.TrimSpace(rawProjectID), "\n") | ||
| projectID := lines[len(lines)-1] |
There was a problem hiding this comment.
There was a problem hiding this comment.
See other uses of ProjectEnvVar --- it's just used verbatium. So there's no need to add extra processing here.
| for i := 0; i < 60; i++ { | ||
| pod, err := f.ClientSet.CoreV1().Pods(f.Namespace.Name).Get(ctx, podName, metav1.GetOptions{}) | ||
| gomega.Expect(err).ToNot(gomega.HaveOccurred()) | ||
| for _, cs := range pod.Status.ContainerStatuses { | ||
| if cs.State.Waiting != nil && cs.State.Waiting.Reason == "CreateContainerError" { | ||
| ginkgo.By(fmt.Sprintf("CreateContainerError on %s: %s", cs.Name, cs.State.Waiting.Message)) | ||
| } | ||
| } | ||
|
|
||
| events, err := f.ClientSet.CoreV1().Events(f.Namespace.Name).List(ctx, | ||
| metav1.ListOptions{ | ||
| FieldSelector: fmt.Sprintf( | ||
| "involvedObject.name=%s", | ||
| podName, | ||
| ), | ||
| }, | ||
| ) | ||
|
|
||
| gomega.Expect(err).ToNot(gomega.HaveOccurred()) | ||
| lastEvents = events.Items | ||
| for _, e := range events.Items { | ||
|
|
||
| ginkgo.By(fmt.Sprintf("Event [%s]: %s", e.Reason, e.Message)) | ||
|
|
||
| if strings.Contains(e.Message, "PermissionDenied") && (strings.Contains(e.Message, "storage.objects.list") || strings.Contains(e.Message, "storageLayout") || strings.Contains(e.Message, "failed to get GCS bucket")) { | ||
| foundAuthFailure = true | ||
| foundWrongReason = false | ||
| ginkgo.By("Confirmed PermissionDenied auth failure") | ||
| break | ||
| } | ||
|
|
||
| if !foundAuthFailure && (strings.Contains(e.Message, "transport endpoint is not connected") || strings.Contains(e.Message, "failed to generate container") || strings.Contains(e.Message, "failed to stat")) { | ||
| foundWrongReason = true | ||
| } | ||
| } | ||
|
|
||
| if foundAuthFailure { | ||
| break | ||
| } | ||
| time.Sleep(5 * time.Second) | ||
| } |
There was a problem hiding this comment.
+1, use a poll ftn rather than a sleep loop.
|
Manual testing for this testcase E2E Test Suite [Driver: gcsfuse.csi.storage.gke.io] [Testpattern: CSI Ephemeral-volume (default fs)] workload-identity-federation should fail when node SA has bucket access but pod KSA does not — confirms no node SA fallback SSSSSSSSSSSSSSSSSSSSSSSSSS Ran 1 of 478 Specs in 194.865 seconds On GKE: E2E Test Suite [Driver: gcsfuse.csi.storage.gke.io] [Testpattern: CSI Ephemeral-volume (default fs)] workload-identity-federation should fail when node SA has bucket access but pod KSA does not — confirms no node SA fallback SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS Ran 1 of 479 Specs in 333.411 seconds |
|
/ok-to-test |
…firms no node SA fallback
38aabcb to
b7c5837
Compare
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds a WIF E2E test that is it should fail when node SA has bucket access but pod KSA does not — confirms no node SA fallback. Supports both OSS (external WIF via OIDC pool/provider) and GKE (native Workload Identity) clusters via IS_OSS.
Which issue(s) this PR fixes:
N/A