Skip to content

Cluster Autoscaler Not Scaling Nodes on GCP with Kops 1.31.6 #17302

Open
@bhuwoan

Description

@bhuwoan

Description

I am experiencing an issue where the Cluster Autoscaler (CA) is not able to scale up worker nodes in a Kops-managed Kubernetes cluster on GCP. Despite having unschedulable pods, the autoscaler logs indicate that no node group configurations are found.

Environment Details

Kubernetes Version: 1.31.6
Kops Version: Latest (v1.31.x)
Cloud Provider: GCP
Cluster Topology: Private
Networking: Calico
Autoscaler Version: Helm Chart autoscaler/cluster-autoscaler v9.44.0
Helm App Version: 1.31
API LoadBalancer Type: Internal
Region: europe-west2

Cluster Creation Command

kops create cluster --name=kops-cluster.k8s.local --state=${KOPS_STATE_STORE}/ --project=${PROJECT} --zones=europe-west2-a,europe-west2-b,europe-west2-c --kubernetes-version=1.31.6 --node-count=3 --node-size=e2-medium --control-plane-size=e2-small --control-plane-count=3 --api-loadbalancer-type=internal --networking=calico --topology=private --bastion=true --cloud=gce --etcd-storage-type=pd-ssd

Instance Groups

Image

CA Installation Command
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update
helm install ca-kops autoscaler/cluster-autoscaler --version 9.44.0 -f ca-kops.yaml

Cluster Autoscaler Values File (ca-kops.yaml)

autoDiscovery:
clusterName: kops-cluster.k8s.local
cloudProvider: gce
autoscalingGroupsnamePrefix:

name: a-nodes-europe-west2-
maxSize: 3
minSize: 1
name: b-nodes-europe-west2-
maxSize: 3
minSize: 1
extraArgs:
skip-nodes-with-local-storage: false
scale-down-delay-after-add: 2m
scale-down-unneeded-time: 2m`

Logs Observed

In CA pod logs, the following repeated error appears:

I0305 08:58:59.681204 1 pre_filtering_processor.go:57] Node nodes-europe-west2-a-12fl should not be processed by cluster autoscaler (no node group config)
I0305 08:58:59.681216 1 pre_filtering_processor.go:57] Node nodes-europe-west2-b-rfsf should not be processed by cluster autoscaler (no node group config)
I0305 08:58:59.681224 1 pre_filtering_processor.go:57] Node nodes-europe-west2-c-p5mt should not be processed by cluster autoscaler (no node group config)
I0305 08:58:59.681233 1 pre_filtering_processor.go:57] Node control-plane-europe-west2-a-chnf should not be processed by cluster autoscaler (no node group config)
I0305 08:58:59.681241 1 pre_filtering_processor.go:57] Node control-plane-europe-west2-b-sqdl should not be processed by cluster autoscaler (no node group config)
I0305 08:58:59.681248 1 pre_filtering_processor.go:57] Node control-plane-europe-west2-c-px2w should not be processed by cluster autoscaler (no node group config)

What I've Tried

  • Checked that the Cluster Autoscaler service account has the following GCP IAM roles:
  • roles/compute.admin
  • roles/iam.workloadIdentityUser
  • roles/container.admin
  • Verified the labels on the nodes.
  • Restarted the Cluster Autoscaler pod.
  • Ensured the Instance Group names match the autoscalingGroupsnamePrefix.
  • Tried scaling manually using kubectl scale deployment.
  • Reviewed Helm values to confirm the correct GCP cloud provider.

Request

Please help to:

  • Confirm if the autoscalingGroupsnamePrefix format is correct for Kops on GCP.
  • Identify if additional configurations are needed in the Instance Group or Kops cluster spec.
  • Debug why node groups are not recognized by the Cluster Autoscaler.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions