-
Notifications
You must be signed in to change notification settings - Fork 71
Description
/kind bug
What happened?
Setting the startup taint (s3.csi.aws.com/agent-not-ready:NoExecute) can result in nodes not reaching the ready state in a timely manner. I notice this when using Karpenter to set the startup taints. I have not tested startup taints on an EKS Managed Node Group (MNG). What I see is very similar to this issue with the EBS CSI Driver.
What you expected to happen?
Taint to be removed immediately after CSI driver registration.
How to reproduce it (as minimally and precisely as possible)?
On an EKS cluster with the Mountpoint for S3 CSI Driver Add-On, and Karpenter deployed for node autoscaling, configure the NodePool to add the startup taint. Scale a Deployment such that new nodes are created. Watch the logs of the "s3-plugin" container in the "s3-csi-node" Pod. The Deployment does not need to specify Mountpoint for S3 volumes - any simple application will do.
Anything else we need to know?:
Excerpts from "s3-plugin" logs. Beginning of the logs:
I1101 17:23:25.143662 1 driver.go:103] Driver version: 2.1.0, Git commit: 83d9ceb6555182811ce0178627530b7bd58b73e9, build date: 2025-10-02T17:29:59Z, nodeID: ip-172-31-70-229.us-gov-west-1.compute.internal, mount-s3 version: 1.20.0, kubernetes version: v1.33.5-eks-113cf36
I1101 17:23:25.146267 1 mount_linux.go:282] Detected umount with safe 'not mounted' behavior
I1101 17:23:25.146537 1 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
I1101 17:23:25.146568 1 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
I1101 17:23:25.146591 1 reflector.go:305] Starting reflector *v1.Pod (1m0s) from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.146604 1 reflector.go:341] Listing and watching *v1.Pod from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.150738 1 reflector.go:368] Caches populated for *v1.Pod from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.254162 1 driver.go:232] Using `spec.nodeName` filter for caching MountpointS3PodAttachment as the cluster supports it
I1101 17:23:25.255631 1 reflector.go:305] Starting reflector *v2.MountpointS3PodAttachment (57.944571727s) from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.255652 1 reflector.go:341] Listing and watching *v2.MountpointS3PodAttachment from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.256890 1 reflector.go:368] Caches populated for *v2.MountpointS3PodAttachment from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.356586 1 driver.go:192] Listening for connections on address: &net.UnixAddr{Name:"/var/lib/kubelet/plugins/s3.csi.aws.com/csi.sock", Net:"unix"}
I1101 17:23:25.356643 1 taint.go:61] Starting taint watcher for taint s3.csi.aws.com/agent-not-ready in node ip-172-31-70-229.us-gov-west-1.compute.internal (max duration: 10m0s)
I1101 17:23:25.356741 1 reflector.go:305] Starting reflector *v1.Node (5s) from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.356758 1 reflector.go:341] Listing and watching *v1.Node from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.359931 1 reflector.go:368] Caches populated for *v1.Node from pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243
I1101 17:23:25.360089 1 taint.go:78] Found "s3.csi.aws.com/agent-not-ready" taint on node "ip-172-31-70-229.us-gov-west-1.compute.internal", attempting removal
I1101 17:23:25.366742 1 taint.go:92] CSI driver not yet registered, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: CSI driver s3.csi.aws.com not found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:25.457565 1 taint.go:78] Found "s3.csi.aws.com/agent-not-ready" taint on node "ip-172-31-70-229.us-gov-west-1.compute.internal", attempting removal
I1101 17:23:25.461072 1 taint.go:92] CSI driver not yet registered, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: CSI driver s3.csi.aws.com not found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:26.398958 1 node.go:225] NodeGetInfo: called with args
I1101 17:23:27.372458 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:27.372482 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:23:27.377787 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:27.377787 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:27.377787 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:27.377787 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
I1101 17:23:27.465949 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:27.465968 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:23:27.469466 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:27.469466 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:27.469466 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:27.469466 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
I1101 17:23:30.362358 1 reflector.go:389] pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243: forcing resync
I1101 17:23:30.384616 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:30.384633 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:23:30.390229 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:30.390229 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:30.390229 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:30.390229 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
I1101 17:23:30.473030 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:30.473049 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:23:30.477778 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:30.477778 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:30.477778 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:30.477778 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
I1101 17:23:34.898820 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:34.898843 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:23:34.904048 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:34.904048 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:34.904048 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:34.904048 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
I1101 17:23:34.981680 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:34.981698 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:23:34.985410 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:34.985410 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:34.985410 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:34.985410 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
I1101 17:23:35.362812 1 reflector.go:389] pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243: forcing resync
I1101 17:23:40.363336 1 reflector.go:389] pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243: forcing resync
I1101 17:23:41.659926 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:41.659944 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:23:41.664636 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:41.664636 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:41.664636 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:41.664636 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:23:41.664659 1 taint.go:105] Timed out trying to remove agent-not-ready taint from node ip-172-31-70-229.us-gov-west-1.compute.internal: timed out waiting for the condition
E1101 17:23:41.664659 1 taint.go:105] Timed out trying to remove agent-not-ready taint from node ip-172-31-70-229.us-gov-west-1.compute.internal: timed out waiting for the condition
E1101 17:23:41.664659 1 taint.go:105] Timed out trying to remove agent-not-ready taint from node ip-172-31-70-229.us-gov-west-1.compute.internal: timed out waiting for the condition
E1101 17:23:41.664659 1 taint.go:105] Timed out trying to remove agent-not-ready taint from node ip-172-31-70-229.us-gov-west-1.compute.internal: timed out waiting for the condition
I1101 17:23:41.664690 1 taint.go:78] Found "s3.csi.aws.com/agent-not-ready" taint on node "ip-172-31-70-229.us-gov-west-1.compute.internal", attempting removal
I1101 17:23:41.668233 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:23:41.668250 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
When taint removal finally succeeds:
E1101 17:25:52.036132 1 taint.go:105] Timed out trying to remove agent-not-ready taint from node ip-172-31-70-229.us-gov-west-1.compute.internal: timed out waiting for the condition
E1101 17:25:52.036132 1 taint.go:105] Timed out trying to remove agent-not-ready taint from node ip-172-31-70-229.us-gov-west-1.compute.internal: timed out waiting for the condition
E1101 17:25:52.036132 1 taint.go:105] Timed out trying to remove agent-not-ready taint from node ip-172-31-70-229.us-gov-west-1.compute.internal: timed out waiting for the condition
E1101 17:25:52.036132 1 taint.go:105] Timed out trying to remove agent-not-ready taint from node ip-172-31-70-229.us-gov-west-1.compute.internal: timed out waiting for the condition
I1101 17:25:52.036158 1 taint.go:78] Found "s3.csi.aws.com/agent-not-ready" taint on node "ip-172-31-70-229.us-gov-west-1.compute.internal", attempting removal
I1101 17:25:52.039958 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:25:52.039979 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
I1101 17:25:52.077604 1 taint.go:218] Removed taint(s) from local node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:25:52.077632 1 taint.go:100] Successfully removed "s3.csi.aws.com/agent-not-ready" taint from node "ip-172-31-70-229.us-gov-west-1.compute.internal", stopping the watcher
I1101 17:25:52.077665 1 taint.go:78] Found "s3.csi.aws.com/agent-not-ready" taint on node "ip-172-31-70-229.us-gov-west-1.compute.internal", attempting removal
I1101 17:25:52.097583 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:25:52.097844 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:25:52.113295 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:25:52.113295 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:25:52.113295 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:25:52.113295 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
I1101 17:25:54.117757 1 taint.go:165] CSI driver s3.csi.aws.com found in CSINode for node ip-172-31-70-229.us-gov-west-1.compute.internal
I1101 17:25:54.117787 1 taint.go:187] Queued taint for removal: key=s3.csi.aws.com/agent-not-ready, effect=NoExecute
E1101 17:25:54.122070 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:25:54.122070 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:25:54.122070 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
E1101 17:25:54.122070 1 taint.go:97] Failed to remove agent-not-ready taint, retrying for node ip-172-31-70-229.us-gov-west-1.compute.internal: the server rejected our request due to an error in our request
I1101 17:25:55.384355 1 reflector.go:389] pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243: forcing resync
Environment
- Kubernetes version (use
kubectl version):
Client Version: v1.33.0-eks-802817d
Kustomize Version: v5.6.0
Server Version: v1.33.5-eks-113cf36
- Driver version: Driver version: 2.1.0, Git commit: 83d9ceb, build date: 2025-10-02T17:29:59Z