-
Notifications
You must be signed in to change notification settings - Fork 51
update troubleshooting guide for sidecar bucket access check feature #1153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Sneha-at
merged 5 commits into
GoogleCloudPlatform:main
from
Sneha-at:update_release_docs
Mar 27, 2026
Merged
Changes from 3 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
44a0398
update troubleshooting guide for sidecar bucket access check feature
Sneha-at 5aa6cae
reword guide to expand more on the customer journey
Sneha-at 6c55f22
add more details around the gap with sidecar bucket acess check feature
Sneha-at 869a917
rephrase guide based on feedback
Sneha-at db34fce
add details for host network enabled workloads
Sneha-at File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -237,6 +237,48 @@ Use the following considerations when troubleshooting file cache performance iss | |
|
|
||
| Increase the volume attribute `fileCacheCapacity` value to make sure it is larger than the total file size. | ||
|
|
||
|
|
||
| ## Mounting issues due to bucket access verification | ||
|
|
||
| #### 1. Error in fetching token from metadataserver | ||
| This error can appear if the [Metadata server component](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/workload-identity#metadata_server) the token fetching from metadataserver fails due to any reason for e.g. metadataserver not yet ready. | ||
saikat-royc marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
saikat-royc marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ``` | ||
saikat-royc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| textPayload="Error: mountWithStorageHandle: fs.NewServer: create file system: SetUpBucket: BucketHandle: storageLayout call failed: rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: Get \"http://169.169.254/computeMetadata/v1/instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" | ||
| ``` | ||
|
|
||
| Failure to fetch access token results in GCS Fuse failures during mounting. | ||
| ``` | ||
| textPayload="Error: mountWithStorageHandle: fs.NewServer: create file system: SetUpBucket: BucketHandle: storageLayout call failed: rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: compute: Received 403 Unauthenticated" | ||
| ``` | ||
|
|
||
| #### 2. Pod is stuck in pending state after failing to access the bucket | ||
| While mounting the GCS Fuse volume if the pod does not have access to the bucket it fails with any relevant errors as mentioned above in [MountVolume.SetUp failures](#mountvolumesetup-failures). | ||
| ``` | ||
| Error while mounting gcsfuse: mountWithStorageHandle: fs.NewServer: create file system: SetUpBucket: BucketHandle: storageLayout call failed: rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: compute: Received 504 | ||
| ``` | ||
| Even once the bucket access is fixed (follow guidelines from [MountVolume.SetUp failures](#mountvolumesetup-failures))the GCS Fuse CSI sidecar currently does not re-try to mount the volume which results in pod stuck in pending state. For clusters on GKE version <1.34.1-gke.3899001 pod will hae to be restarted to re-try accessing the bucket and the mount. | ||
|
|
||
| #### 3. Quota exhaustion on high scale workloads | ||
| GCS Fuse CSI driver queries GKE Metadataserver twice in a mounting lifecycle a. to verify bucket access before mounting b. access verification while spawning GCS Fuse. | ||
|
|
||
| At high scale workloads, this can lead to STS quota exhaustion issues as too many pods are querying the MDS (metadata server) at the same time. | ||
|
|
||
| ### Solution | ||
| Starting cluster version 1.34.1-gke.3899001+ the GKE public image for GCS Fuse sidecar provides a way to auto recover pods from temporary bucket permission issue. The sidecar version and GKE sidecar image 1.21.9+ implements a bucket access check by default before attempting to mount the volume. This feature reuses the STS token for bucket access check and GCS Fuse process thus reducing the STS quota consumption by 50% | ||
Sneha-at marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Limitation | ||
| We have noticed a gap in the implementation for the sidecar bucket access check feature specified above due to which the GCS Fuse sidecar container fails to retry if metadata server is not yet up. This means the solution will resolve issues (2) and (3) but not (1). | ||
|
|
||
| This gap is being fixed in [PR](https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/pull/1261) and will soon be released. Meanwhile, please follow the mitigation and deploy the sidecar as a private sidecar container image. Please not the feature will still be enabled if the GKE public image from gcr.io/gke-release/gcs-fuse-csi-driver-sidecar-mounter is used. | ||
Sneha-at marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Recommendation | ||
|
|
||
| 1. **[Only for cluster version < 1.34.1-gke.3899001]** Set `skipCSIBucketAccessCheck:false` through [volume attribute class](https://docs.cloud.google.com/kubernetes-engine/docs/reference/cloud-storage-fuse-csi-driver/volume-attr). This provides multiple benefits | ||
| * This performs bucket access check before attempting to mount the volume, however, at high scale workloads might experience STS quota exhaustion issues due number of access verification calls. The below method is recommended for high scale workloads (it offers 50% reduction in STS quota consumption for GCS Fuse CSI driver) | ||
| * Bucket access check in GKE node driver connects with metadata service to verify authentication. The node driver pod retries to access the bucket until the metadata service is up and the bucket is reachable. | ||
Sneha-at marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 2. **[Recommended for high-scale workloads]** Cluster on GKE version 1.34.1-gke.3899001+ by default performs bucket access check in the sidecar before attempting to mount the volume. This is auto enabled and does not need any further configuration. To see any reduction in STS quota consumption, ensure `skipCSIBucketAccessCheck` is set to `true` in the [volume attribute class](https://docs.cloud.google.com/kubernetes-engine/docs/reference/cloud-storage-fuse-csi-driver/volume-attr). Please note this feature is currently only supported for managed driver. The feature is also enabled if you use a private sidecar with GKE provided public sidecar image from gcr.io/gke-release/gcs-fuse-csi-driver-sidecar-mounter. Please refer to [Limitations](#limitation) for more details. | ||
Sneha-at marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Sneha-at marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| ## Performance issues | ||
|
|
||
| This section aims to provide troubleshooting steps and tips to resolve Cloud Storage FUSE CSI driver performance issues. | ||
|
|
@@ -455,4 +497,3 @@ csi: | |
| volumeAttributes: | ||
| mountOptions: "log-severity=trace" | ||
| ``` | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.