Description
Describe the bug
It appears that sometimes on our nodes the secrets-store-csi-driver-provider-aws pod will start before the secrets-store-csi-driver pod. When that happens, our application pods that mount AWS secrets on such a node will get stuck in ContainerCreating status and the secrets-store-csi-driver pod generates logs such as:
"failed to mount secrets store object content" err="rpc error: code = Canceled desc = latest balancer error: connection error: desc = \"transport: Error while dialing: dial unix /etc/kubernetes/secrets-store-csi-providers/aws.sock: connect: no such file or directory\"" pod="mynamespace/mypod"
The secrets-store-csi-driver-provider-aws does not generate any logs which indicate an issue.
To Reproduce
Steps to reproduce the behavior:
Start the secrets-store-csi-driver-provider-aws pod will before the secrets-store-csi-driver pod and create an application pod that mounts AWS secrets.
Do you also notice this bug when using a different secrets store provider (Vault/Azure/GCP...)? Yes/No
Haven't tried
If yes, the issue is likely with the k8s Secrets Store CSI driver, not the AWS provider. Open an issue in that repo.
Expected behavior
The application pod mounts the secrets
Environment:
OS, Go version, etc.
EKS 1.29.1, secrets-store-csi-driver 1.4.2, secrets-store-csi-driver-provider-aws 1.0.r2-68-gab548b3-2024.03.20.21.58
Additional context
We noticed the secrets-store-csi-driver-provider-aws chart does not include type: DirectoryOrCreate
in the providervol
volume definition. Is it possible when this hostpath doesn't exist on startup that the provider won't be able to use the socket but not generate any errors?