Which component are you using?: cluster-autoscaler on AWS
/area cluster-autoscaler
What version of the component are you using?: 9.45
Component version: Helm chart 9.45
What k8s version are you using (kubectl version
kubectl version
$ kubectl version Client Version: v1.31.2 Kustomize Version: v5.4.2 Server Version: v1.31.3-eks-56e63d8
What environment is this in?: AWS EKS
What did you expect to happen?: I am trying to figure out why the autoscaler does not honor my --ok-total-unready-count=0
. It seems the node that enters the NotReady
state is stuck with many terminating pods, and I observed at the same time the error in the autoscaler log.
The error is the following:
failed to list *v1.VolumeAttachment: is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list resource "volumeattachments" in API group "" at the cluster scope
When looking at the clusterrole created by the helm chart, I am not seeing this particular resource:
$ k describe clusterrole cluster-autoscaler-aws-cluster-autoscaler
Name: cluster-autoscaler-aws-cluster-autoscaler
Annotations: cluster-autoscaler kube-system
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
endpoints [] [] [create patch]
events [] [] [create patch]
pods/eviction [] [] [create] [] [] [create]
jobs.extensions [] [] [get list patch watch]
endpoints [] [cluster-autoscaler] [get update] [] [cluster-autoscaler] [get update]
configmaps [] [] [list watch get]
pods/status [] [] [update]
nodes [] [] [watch list create delete get update]
jobs.batch [] [] [watch list get patch]
namespaces [] [] [watch list get]
persistentvolumeclaims [] [] [watch list get]
persistentvolumes [] [] [watch list get]
pods [] [] [watch list get]
replicationcontrollers [] [] [watch list get]
services [] [] [watch list get]
daemonsets.apps [] [] [watch list get]
replicasets.apps [] [] [watch list get]
statefulsets.apps [] [] [watch list get]
cronjobs.batch [] [] [watch list get]
daemonsets.extensions [] [] [watch list get]
replicasets.extensions [] [] [watch list get] [] [] [watch list get] [] [] [watch list get] [] [] [watch list get] [] [] [watch list get]
poddisruptionbudgets.policy [] [] [watch list]
I am not sure, but given the --ok-total-unready-count=0
, I would expect the node which enters the NotReady
state to be fairly quickly replaced by a node that can handle things.
What happened instead?:
The NotReady
node sticks around for quite some time, with bunch of pods in Terminating
state. Eventually, it'll go away after some time (maybe 30-45mn).
How to reproduce it (as minimally and precisely as possible):
Something is causing my node to get to NotReady
state, I think way too much over-committment on them, especially on memory (then the kubelet then bails out).
I am afraid I can't :-/
Anything else we need to know?:
An log iteration where I see the volumeattachment error:
I0106 17:52:52.606768 1 static_autoscaler.go:274] Starting main loop
I0106 17:52:52.609136 1 aws_manager.go:188] Found multiple availability zones for ASG "eks-default_node_group-20241211130258966500000008-7cc9dae8-63f0-63d5-bce1-642871ebd84f"; using eu-central-2b for label
I0106 17:52:52.758096 1 filter_out_schedulable.go:65] Filtering out schedulables
I0106 17:52:52.758116 1 filter_out_schedulable.go:122] 0 pods marked as unschedulable can be scheduled.
I0106 17:52:52.758125 1 filter_out_schedulable.go:85] No schedulable pods
I0106 17:52:52.758130 1 filter_out_daemon_sets.go:47] Filtered out 0 daemon set pods, 0 unschedulable pods left
I0106 17:52:52.758150 1 static_autoscaler.go:532] No unschedulable pods
I0106 17:52:52.758168 1 static_autoscaler.go:555] Calculating unneeded nodes
I0106 17:52:52.758182 1 pre_filtering_processor.go:67] Skipping - node group min size reached (current: 3, min: 3)
I0106 17:52:52.758204 1 pre_filtering_processor.go:67] Skipping - node group min size reached (current: 3, min: 3)
I0106 17:52:52.758209 1 pre_filtering_processor.go:67] Skipping - node group min size reached (current: 3, min: 3)
I0106 17:52:52.758213 1 pre_filtering_processor.go:67] Skipping - node group min size reached (current: 3, min: 3)
I0106 17:52:52.758473 1 static_autoscaler.go:598] Scale down status: lastScaleUpTime=2025-01-06 16:16:32.949347114 +0000 UTC m=-3582.400670434 lastScaleDownDeleteTime=2025-01-06 16:16:32.949347114 +0000 UTC m=-3582.400670434 lastScaleDownFailTime=2025-01-06 16:16:32.949347114 +0000 UTC m=-3582.400670434 scaleDownForbidden=false scaleDownInCooldown=true
I0106 17:52:52.759061 1 orchestrator.go:322] ScaleUpToNodeGroupMinSize: NodeGroup eks-default_node_group-20241211130258966500000008-7cc9dae8-63f0-63d5-bce1-642871ebd84f, TargetSize 3, MinSize 3, MaxSize 5
I0106 17:52:52.759135 1 orchestrator.go:366] ScaleUpToNodeGroupMinSize: scale up not needed
I0106 17:52:56.201819 1 reflector.go:349] Listing and watching *v1.VolumeAttachment from pkg/mod/[email protected]/tools/cache/reflector.go:251
W0106 17:52:56.206308 1 reflector.go:569] pkg/mod/[email protected]/tools/cache/reflector.go:251: failed to list *v1.VolumeAttachment: is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list resource "volumeattachments" in API group "" at the cluster scope
E0106 17:52:56.206341 1 reflector.go:166] "Unhandled Error" err="pkg/mod/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1.VolumeAttachment: failed to list *v1.VolumeAttachment: is forbidden: User \"system:serviceaccount:kube-system:cluster-autoscaler\" cannot list resource \"volumeattachments\" in API group \"\" at the cluster scope" logger="UnhandledError"
I0106 17:52:57.975501 1 reflector.go:879] pkg/mod/[email protected]/tools/cache/reflector.go:251: Watch close - *v1.Node total 29 items received