You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/troubleshooting.md
+11-9Lines changed: 11 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -240,8 +240,8 @@ Use the following considerations when troubleshooting file cache performance iss
240
240
241
241
## Mounting issues due to bucket access verification
242
242
243
-
#### 1. Error in fetching token from metadataserver
244
-
This error can appear if the [Metadata server component](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/workload-identity#metadata_server) the token fetching from metadataserver fails due to any reason for e.g. metadataserver not yet ready.
243
+
#### 1. Error in fetching token from GKE metadata server
244
+
This error can appear if token fetching from [GKE metadata server component](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/workload-identity#metadata_server) fails due to any reason for e.g. metadataserver not yet ready.
245
245
```
246
246
textPayload="Error: mountWithStorageHandle: fs.NewServer: create file system: SetUpBucket: BucketHandle: storageLayout call failed: rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: Get \"http://169.169.254/computeMetadata/v1/instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
247
247
```
@@ -251,6 +251,8 @@ Failure to fetch access token results in GCS Fuse failures during mounting.
251
251
textPayload="Error: mountWithStorageHandle: fs.NewServer: create file system: SetUpBucket: BucketHandle: storageLayout call failed: rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: compute: Received 403 Unauthenticated"
252
252
```
253
253
254
+
These errors would be reported in GCS Fuse CSI sidecar (`gke-gcsfuse-sidecar`) container.
255
+
254
256
#### 2. Pod is stuck in pending state after failing to access the bucket
255
257
While mounting the GCS Fuse volume if the pod does not have access to the bucket it fails with any relevant errors as mentioned above in [MountVolume.SetUp failures](#mountvolumesetup-failures).
256
258
```
@@ -261,22 +263,22 @@ Even once the bucket access is fixed (follow guidelines from [MountVolume.SetUp
261
263
#### 3. Quota exhaustion on high scale workloads
262
264
GCS Fuse CSI driver queries GKE Metadataserver twice in a mounting lifecycle a. to verify bucket access before mounting b. access verification while spawning GCS Fuse.
263
265
264
-
At high scale workloads, this can lead to STS quota exhaustion issues as too many pods are querying the MDS (metadata server) at the same time.
266
+
At high scale workloads, this can lead to STS quota exhaustion issues as too many pods are querying the MDS (GKE metadata server) at the same time.
265
267
266
268
### Solution
267
-
Starting cluster version 1.34.1-gke.3899001+ the GKE public image for GCS Fuse sidecar provides a way to auto recover pods from temporary bucket permission issue. The sidecar version and GKE sidecar image 1.21.9+ implements a bucket access check by default before attempting to mount the volume. This feature reuses the STS token for bucket access check and GCS Fuse process thus reducing the STS quota consumption by 50%
269
+
Starting cluster version 1.34.1-gke.3899001+ the GKE public image for GCS Fuse sidecar provides a way to auto recover pods from temporary bucket permission issue. The sidecar version and GKE sidecar image 1.21.9+ [implements a bucket access check](https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/pull/605) by default before attempting to mount the volume. This feature reuses the STS token for bucket access check and GCS Fuse process thus reducing the STS quota consumption by 50%
268
270
269
-
### Limitation
270
-
We have noticed a gap in the implementation for the sidecar bucket access check feature specified above due to which the GCS Fuse sidecar container fails to retry if metadata server is not yet up. This means the solution will resolve issues (2) and (3) but not (1).
271
+
### Limitations
272
+
We have noticed a gap in the implementation for the sidecar bucket access check feature specified above due to which the GCS Fuse sidecar container fails to retry if GKE metadata server is not yet up. This means the solution will resolve issues (2) and (3) but not (1).
271
273
272
-
This gap is being fixed in [PR](https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/pull/1261) and will soon be released. Meanwhile, please follow the mitigation and deploy the sidecar as a private sidecar container image. Please not the feature will still be enabled if the GKE public image from gcr.io/gke-release/gcs-fuse-csi-driver-sidecar-mounter is used.
274
+
This gap is being fixed in [PR](https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/pull/1261) and will soon be released. Meanwhile, please follow the mitigation and deploy the sidecar as a [private sidecar](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/cloud-storage-fuse-csi-driver-setup#private_sidecars) container image. GKE GCS Fuse CSI sidecar public images from gcr.io/gke-release/gcs-fuse-csi-driver-sidecar-mounter will also have the mentioned limitation.
273
275
274
276
### Recommendation
275
277
276
278
1. **[Only for cluster version < 1.34.1-gke.3899001]** Set `skipCSIBucketAccessCheck:false` through [volume attribute class](https://docs.cloud.google.com/kubernetes-engine/docs/reference/cloud-storage-fuse-csi-driver/volume-attr). This provides multiple benefits
277
279
* This performs bucket access check before attempting to mount the volume, however, at high scale workloads might experience STS quota exhaustion issues due number of access verification calls. The below method is recommended for high scale workloads (it offers 50% reduction in STS quota consumption for GCS Fuse CSI driver)
278
-
* Bucket access check in GKE node driver connects with metadata service to verify authentication. The node driver pod retries to access the bucket until the metadata service is up and the bucket is reachable.
279
-
2. **[Recommended for high-scale workloads]** Cluster on GKE version 1.34.1-gke.3899001+ by default performs bucket access check in the sidecar before attempting to mount the volume. This is auto enabled and does not need any further configuration. To see any reduction in STS quota consumption, ensure `skipCSIBucketAccessCheck` is set to `true` in the [volume attribute class](https://docs.cloud.google.com/kubernetes-engine/docs/reference/cloud-storage-fuse-csi-driver/volume-attr). Please note this feature is currently only supported for managed driver. The feature is also enabled if you use a private sidecar with GKE provided public sidecar image from gcr.io/gke-release/gcs-fuse-csi-driver-sidecar-mounter. Please refer to [Limitations](#limitation) for more details.
280
+
* Bucket access check in GKE node driver connects with GKE metadata service to verify authentication. The node driver pod retries to access the bucket until the GKE metadata service is up and the bucket is reachable.
281
+
2. **[Recommended for high-scale workloads]** Cluster on GKE version 1.34.1-gke.3899001+ by default performs bucket access check in the sidecar before attempting to mount the volume. This is auto enabled and does not need any further configuration. To see any reduction in STS quota consumption, ensure `skipCSIBucketAccessCheck` is set to `true` in the [volume attribute class](https://docs.cloud.google.com/kubernetes-engine/docs/reference/cloud-storage-fuse-csi-driver/volume-attr). Please note this feature is currently only supported for [GKE managed CSI driver](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/cloud-storage-fuse-csi-driver-setup). The feature is also enabled if you use a [private sidecar](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/cloud-storage-fuse-csi-driver-setup#private_sidecars) with GKE provided public sidecar image from gcr.io/gke-release/gcs-fuse-csi-driver-sidecar-mounter. Please refer to [Limitations](#limitations) for more details.
0 commit comments