Skip to content

Revert "Implemented HasInstance for GCE provider" as it is causing CA E2E test failures#9470

Open
Choraden wants to merge 1 commit intokubernetes:masterfrom
Choraden:revert-9319
Open

Revert "Implemented HasInstance for GCE provider" as it is causing CA E2E test failures#9470
Choraden wants to merge 1 commit intokubernetes:masterfrom
Choraden:revert-9319

Conversation

@Choraden
Copy link
Copy Markdown
Contributor

@Choraden Choraden commented Apr 9, 2026

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

This reverts commit 785b523.

Since April 3rd both CA presubmits and periodic E2E tests fail:
https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/pr-logs/directory/pull-autoscaling-e2e-gci-gce-ca-test
https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/logs/ci-kubernetes-e2e-gci-gce-autoscaling

I have reasons to think this change is causing CA E2E tests failures.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area labels Apr 9, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Choraden
Once this PR has been reviewed and has the lgtm label, please assign towca for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from x13n April 9, 2026 12:06
@k8s-ci-robot k8s-ci-robot added area/provider/gce size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-area labels Apr 9, 2026
@Choraden
Copy link
Copy Markdown
Contributor Author

Choraden commented Apr 9, 2026

/test all

@Choraden
Copy link
Copy Markdown
Contributor Author

Choraden commented Apr 9, 2026

The test succeeded. Rerunning to gather more datapoints.
/retest pull-autoscaling-e2e-gci-gce-ca-test

@Choraden
Copy link
Copy Markdown
Contributor Author

Choraden commented Apr 9, 2026

/test pull-autoscaling-e2e-gci-gce-ca-test

1 similar comment
@jackfrancis
Copy link
Copy Markdown
Contributor

/test pull-autoscaling-e2e-gci-gce-ca-test

@Choraden Choraden changed the title [Test] Revert "Implemented HasInstance for GCE provider" Revert "Implemented HasInstance for GCE provider" as it is causing CA E2E test failures Apr 10, 2026
@Choraden Choraden marked this pull request as ready for review April 10, 2026 08:20
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2026
@k8s-ci-robot k8s-ci-robot requested a review from towca April 10, 2026 08:20
@Choraden
Copy link
Copy Markdown
Contributor Author

Choraden commented Apr 10, 2026

@domenicbozzuto @jbtk @x13n Since April 3rd both CA presubmits and periodic E2E tests fail(/are flaky, but mostly fail):
https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/pr-logs/directory/pull-autoscaling-e2e-gci-gce-ca-test
https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/logs/ci-kubernetes-e2e-gci-gce-autoscaling
By bisecting the git log I was able to narrow down the possible culprit to #9319.
To verify it, I reverted that change. Notice that 3 consecutive E2E test runs succeeded.
It looks like you were lucky to have the presubmit succeeded on your PR. This way you were able to merge the change.

TBH I don't really understand how it breaks the CA, but when it comes to test cases, usually there are 3 nodes and we expect the scale up to 5. For some reason CA does 3->5 and in the next loop 5->6.
See the logs:

I0409 14:09:44.709864       1 static_autoscaler.go:296] Starting main loop
I0409 14:09:44.710269       1 static_autoscaler.go:1233] Found 45 pods in the cluster: 43 scheduled, 2 unschedulable, 0 unprocessed by scheduler, 0 ignored by allowed schedulers (most likely using custom scheduler), 0 ignored due to dissallowed schedulers
W0409 14:09:44.816994       1 templates.go:510] no os defined in AUTOSCALER_ENV_VARS; using default linux
W0409 14:09:44.817038       1 templates.go:641] no os-distribution defined in AUTOSCALER_ENV_VARS; using default cos
W0409 14:09:44.817073       1 templates.go:722] no evictionHard defined in AUTOSCALER_ENV_VARS;
W0409 14:09:44.817087       1 templates.go:232] unable to get evictionHardFromKubeEnv values, continuing without it.
I0409 14:09:44.817125       1 gce_reserved.go:143] evictionHard memory tag not found, using default
I0409 14:09:44.817137       1 gce_reserved.go:163] evictionHard ephemeral storage tag not found, using default
W0409 14:09:44.817174       1 templates.go:242] could not extract kube-reserved from kubeEnv for mig "kt2-6a7d0aef-4f19-minion-group", setting allocatable to capacity.
I0409 14:09:44.817465       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-t8mj: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817488       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-t8mj: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817500       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-master: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817513       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-dqn9: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817523       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-dqn9: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817534       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-gwd4: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817542       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-gwd4: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817555       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-96l9: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817564       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-96l9: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817577       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-ktwv: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817587       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-ktwv: using maxNodeStartupTime = 15m0s
I0409 14:09:44.817892       1 filter_out_schedulable.go:65] Filtering out schedulables
I0409 14:09:44.818367       1 taint_toleration.go:130] "node had untolerated taints" logger="Filter.TaintToleration" node="kt2-6a7d0aef-4f19-master" pod="autoscaling-5580/increase-size-pod-m9x7f" untoleratedTaint={"key":"node-role.kubernetes.io/control-plane","effect":"NoSchedule"}
I0409 14:09:44.818447       1 klogx.go:87] failed to find place for autoscaling-5580/increase-size-pod-m9x7f: can't schedule pod autoscaling-5580/increase-size-pod-m9x7f: couldn't find a matching Node with passing predicates
I0409 14:09:44.819029       1 klogx.go:87] failed to find place for autoscaling-5580/increase-size-pod-wmxd9 based on similar pods scheduling
I0409 14:09:44.819053       1 filter_out_schedulable.go:122] 0 pods marked as unschedulable can be scheduled.
I0409 14:09:44.819078       1 filter_out_schedulable.go:85] No schedulable pods
I0409 14:09:44.819139       1 filter_out_daemon_sets.go:47] Filtered out 0 daemon set pods, 2 unschedulable pods left
I0409 14:09:44.819162       1 klogx.go:87] Pod autoscaling-5580/increase-size-pod-m9x7f is unschedulable
I0409 14:09:44.819169       1 klogx.go:87] Pod autoscaling-5580/increase-size-pod-wmxd9 is unschedulable
I0409 14:09:44.819384       1 orchestrator.go:112] Upcoming 0 nodes
I0409 14:09:44.821455       1 waste.go:56] Expanding Node Group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I0409 14:09:44.821501       1 orchestrator.go:189] Best option to resize: https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group
I0409 14:09:44.821576       1 orchestrator.go:193] Estimated 2 nodes needed in https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group
I0409 14:09:44.821702       1 orchestrator.go:265] Final scale-up plan: [{https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group 3->5 (max: 6)}]
I0409 14:09:44.821999       1 executor.go:164] Scale-up: setting group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group size to 5
I0409 14:09:44.822447       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"abfa659d-6408-4151-acf5-428376bb6cbe", APIVersion:"v1", ResourceVersion:"570473", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group size to 5 instead of 3 (max: 6)
I0409 14:09:44.822881       1 mig_info_provider.go:335] Regenerating MIG instances cache for gke-hgrochowski-hosted-master/us-central1-b/kt2-6a7d0aef-4f19-minion-group
I0409 14:09:45.290579       1 autoscaling_gce_client.go:339] Waiting for operation compute.instanceGroupManagers.createInstances/operation-1775743785055-64f0791853399-70244274-ffa28e68 (gke-hgrochowski-hosted-master/us-central1-b)
I0409 14:09:45.726417       1 autoscaling_gce_client.go:346] Operation compute.instanceGroupManagers.createInstances/operation-1775743785055-64f0791853399-70244274-ffa28e68 (gke-hgrochowski-hosted-master/us-central1-b) status: DONE
I0409 14:09:45.727208       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"abfa659d-6408-4151-acf5-428376bb6cbe", APIVersion:"v1", ResourceVersion:"570473", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group size set to 5 instead of 3 (max: 6)
I0409 14:09:45.862217       1 eventing_scale_up_processor.go:47] Skipping event processing for unschedulable pods since there is a ScaleUp attempt this loop
I0409 14:09:45.862756       1 static_autoscaler.go:631] Calculating unneeded nodes
I0409 14:09:45.863793       1 eligibility.go:113] Skipping kt2-6a7d0aef-4f19-minion-group-gwd4 from delete consideration - the node is currently being deleted
I0409 14:09:45.863862       1 eligibility.go:113] Skipping kt2-6a7d0aef-4f19-minion-group-ktwv from delete consideration - the node is currently being deleted
I0409 14:09:45.864178       1 klogx.go:87] Node kt2-6a7d0aef-4f19-minion-group-dqn9 - cpu requested is 9.6% of allocatable
I0409 14:09:45.864704       1 klogx.go:87] Node kt2-6a7d0aef-4f19-minion-group-96l9 - cpu requested is 9.6% of allocatable
I0409 14:09:45.864734       1 eligibility.go:104] Scale-down calculation: ignoring 1 nodes unremovable in the last 1m0s
I0409 14:09:45.864850       1 cluster.go:146] Simulating node kt2-6a7d0aef-4f19-minion-group-dqn9 removal
I0409 14:09:45.865353       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Pod", Namespace:"autoscaling-5580", Name:"increase-size-pod-m9x7f", UID:"08f20056-3184-4c85-a77a-56c8f222ea79", APIVersion:"v1", ResourceVersion:"570467", FieldPath:""}): type: 'Normal' reason: 'TriggeredScaleUp' pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group 3->5 (max: 6)}]
I0409 14:09:45.866182       1 taint_toleration.go:130] "node had untolerated taints" logger="Filter.TaintToleration" node="kt2-6a7d0aef-4f19-master" pod="autoscaling-5580/increase-size-pod-8jfmq" untoleratedTaint={"key":"node-role.kubernetes.io/control-plane","effect":"NoSchedule"}
I0409 14:09:45.869858       1 klogx.go:87] failed to find place for autoscaling-5580/increase-size-pod-8jfmq: can't schedule pod autoscaling-5580/increase-size-pod-8jfmq: couldn't find a matching Node with passing predicates
I0409 14:09:45.870816       1 cluster.go:161] Node kt2-6a7d0aef-4f19-minion-group-dqn9 is not suitable for removal: can reschedule only 0 out of 1 pods
I0409 14:09:45.871231       1 cluster.go:146] Simulating node kt2-6a7d0aef-4f19-minion-group-96l9 removal
I0409 14:09:45.873922       1 taint_toleration.go:130] "node had untolerated taints" logger="Filter.TaintToleration" node="kt2-6a7d0aef-4f19-master" pod="autoscaling-5580/increase-size-pod-qxntv" untoleratedTaint={"key":"node-role.kubernetes.io/control-plane","effect":"NoSchedule"}
I0409 14:09:45.874563       1 klogx.go:87] failed to find place for autoscaling-5580/increase-size-pod-qxntv: can't schedule pod autoscaling-5580/increase-size-pod-qxntv: couldn't find a matching Node with passing predicates
I0409 14:09:45.875002       1 cluster.go:161] Node kt2-6a7d0aef-4f19-minion-group-96l9 is not suitable for removal: can reschedule only 0 out of 1 pods
I0409 14:09:45.875234       1 planner.go:332] 2 nodes found to be unremovable in simulation, will re-check them at 2026-04-09 14:10:44.709841319 +0000 UTC m=+2712.305817519
I0409 14:09:45.876006       1 static_autoscaler.go:674] Scale down status: lastScaleUpTime=2026-04-09 14:09:44.709841319 +0000 UTC m=+2652.305817519 lastScaleDownDeleteTime=2026-04-09 14:09:36.415463385 +0000 UTC m=+2644.011439586 lastScaleDownFailTime=2026-04-09 12:25:33.2226787 +0000 UTC m=-3599.181345073 scaleDownForbidden=false scaleDownInCooldown=true
I0409 14:09:45.882519       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Pod", Namespace:"autoscaling-5580", Name:"increase-size-pod-wmxd9", UID:"55c59228-22b9-4431-bbf3-c20ffd0f7a08", APIVersion:"v1", ResourceVersion:"570470", FieldPath:""}): type: 'Normal' reason: 'TriggeredScaleUp' pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group 3->5 (max: 6)}]
I0409 14:09:45.913638       1 trigger.go:142] Autoscaler loop triggered immediately after a scale up
I0409 14:09:45.913809       1 static_autoscaler.go:296] Starting main loop
I0409 14:09:45.914497       1 static_autoscaler.go:1233] Found 45 pods in the cluster: 43 scheduled, 2 unschedulable, 0 unprocessed by scheduler, 0 ignored by allowed schedulers (most likely using custom scheduler), 0 ignored due to dissallowed schedulers
W0409 14:09:46.038928       1 templates.go:510] no os defined in AUTOSCALER_ENV_VARS; using default linux
W0409 14:09:46.038998       1 templates.go:641] no os-distribution defined in AUTOSCALER_ENV_VARS; using default cos
W0409 14:09:46.039045       1 templates.go:722] no evictionHard defined in AUTOSCALER_ENV_VARS;
W0409 14:09:46.039058       1 templates.go:232] unable to get evictionHardFromKubeEnv values, continuing without it.
I0409 14:09:46.039067       1 gce_reserved.go:143] evictionHard memory tag not found, using default
I0409 14:09:46.039074       1 gce_reserved.go:163] evictionHard ephemeral storage tag not found, using default
W0409 14:09:46.039717       1 templates.go:242] could not extract kube-reserved from kubeEnv for mig "kt2-6a7d0aef-4f19-minion-group", setting allocatable to capacity.
I0409 14:09:46.040078       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-dqn9: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040176       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-dqn9: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040192       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-gwd4: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040202       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-gwd4: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040219       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-96l9: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040228       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-96l9: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040241       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-ktwv: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040250       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-ktwv: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040263       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-t8mj: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040273       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-minion-group-t8mj: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040291       1 clusterstate.go:661] Node kt2-6a7d0aef-4f19-master: using maxNodeStartupTime = 15m0s
I0409 14:09:46.040321       1 clusterstate.go:299] Scale up in group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group finished successfully in 187.290163ms
I0409 14:09:46.040454       1 filter_out_schedulable.go:65] Filtering out schedulables
I0409 14:09:46.042209       1 taint_toleration.go:130] "node had untolerated taints" logger="Filter.TaintToleration" node="kt2-6a7d0aef-4f19-master" pod="autoscaling-5580/increase-size-pod-m9x7f" untoleratedTaint={"key":"node-role.kubernetes.io/control-plane","effect":"NoSchedule"}
I0409 14:09:46.043440       1 klogx.go:87] failed to find place for autoscaling-5580/increase-size-pod-m9x7f: can't schedule pod autoscaling-5580/increase-size-pod-m9x7f: couldn't find a matching Node with passing predicates
I0409 14:09:46.045511       1 klogx.go:87] failed to find place for autoscaling-5580/increase-size-pod-wmxd9 based on similar pods scheduling
I0409 14:09:46.045549       1 filter_out_schedulable.go:122] 0 pods marked as unschedulable can be scheduled.
I0409 14:09:46.045581       1 filter_out_schedulable.go:85] No schedulable pods
I0409 14:09:46.045741       1 filter_out_daemon_sets.go:47] Filtered out 0 daemon set pods, 2 unschedulable pods left
I0409 14:09:46.045772       1 klogx.go:87] Pod autoscaling-5580/increase-size-pod-m9x7f is unschedulable
I0409 14:09:46.048396       1 klogx.go:87] Pod autoscaling-5580/increase-size-pod-wmxd9 is unschedulable
I0409 14:09:46.048580       1 orchestrator.go:112] Upcoming 0 nodes
I0409 14:09:46.051438       1 threshold_based_limiter.go:59] Capping binpacking after exceeding threshold of 1 nodes
I0409 14:09:46.051646       1 waste.go:56] Expanding Node Group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I0409 14:09:46.051787       1 orchestrator.go:189] Best option to resize: https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group
I0409 14:09:46.051809       1 orchestrator.go:193] Estimated 1 nodes needed in https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group
I0409 14:09:46.051847       1 orchestrator.go:265] Final scale-up plan: [{https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group 5->6 (max: 6)}]
I0409 14:09:46.051878       1 executor.go:164] Scale-up: setting group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group size to 6
I0409 14:09:46.052194       1 mig_info_provider.go:335] Regenerating MIG instances cache for gke-hgrochowski-hosted-master/us-central1-b/kt2-6a7d0aef-4f19-minion-group
I0409 14:09:46.052387       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"abfa659d-6408-4151-acf5-428376bb6cbe", APIVersion:"v1", ResourceVersion:"570480", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group size to 6 instead of 5 (max: 6)
I0409 14:09:46.359155       1 autoscaling_gce_client.go:339] Waiting for operation compute.instanceGroupManagers.createInstances/operation-1775743786143-64f079195cdde-9992c48e-beb50d51 (gke-hgrochowski-hosted-master/us-central1-b)
I0409 14:09:47.107997       1 autoscaling_gce_client.go:346] Operation compute.instanceGroupManagers.createInstances/operation-1775743786143-64f079195cdde-9992c48e-beb50d51 (gke-hgrochowski-hosted-master/us-central1-b) status: DONE
I0409 14:09:47.108309       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"abfa659d-6408-4151-acf5-428376bb6cbe", APIVersion:"v1", ResourceVersion:"570480", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: group https://www.googleapis.com/compute/v1/projects/gke-hgrochowski-hosted-master/zones/us-central1-b/instanceGroups/kt2-6a7d0aef-4f19-minion-group size set to 6 instead of 5 (max: 6)
I0409 14:09:47.199437       1 eventing_scale_up_processor.go:47] Skipping event processing for unschedulable pods since there is a ScaleUp attempt this loop
I0409 14:09:47.199533       1 static_autoscaler.go:631] Calculating unneeded nodes
I0409 14:09:47.199942       1 eligibility.go:113] Skipping kt2-6a7d0aef-4f19-minion-group-gwd4 from delete consideration - the node is currently being deleted
I0409 14:09:47.199971       1 eligibility.go:113] Skipping kt2-6a7d0aef-4f19-minion-group-ktwv from delete consideration - the node is currently being deleted
I0409 14:09:47.199985       1 eligibility.go:104] Scale-down calculation: ignoring 3 nodes unremovable in the last 1m0s
I0409 14:09:47.200026       1 static_autoscaler.go:674] Scale down status: lastScaleUpTime=2026-04-09 14:09:45.913791161 +0000 UTC m=+2653.509767361 lastScaleDownDeleteTime=2026-04-09 14:09:36.415463385 +0000 UTC m=+2644.011439586 lastScaleDownFailTime=2026-04-09 12:25:33.2226787 +0000 UTC m=-3599.181345073 scaleDownForbidden=false scaleDownInCooldown=true

@jbtk
Copy link
Copy Markdown
Member

jbtk commented Apr 10, 2026

Since the tests passed in the end on this PR from what I see in the comments can we run these tests 20 times with and without PR and compare flakiness?

@Choraden
Copy link
Copy Markdown
Contributor Author

Since the tests passed in the end on this PR from what I see in the comments can we run these tests 20 times with and without PR and compare flakiness?

I believe the presubmit dashboard should provide us enough insights: https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/pr-logs/directory/pull-autoscaling-e2e-gci-gce-ca-test

@domenicbozzuto
Copy link
Copy Markdown
Contributor

👋 Sorry for the noise with this; I found a similar pattern when I added my PR originally (comment) -- the multiple upscales seemed exactly like the reason shouldn't trigger additional scale-ups during processing scale-up was skipped (e2eskipper.Skipf("Test is flaky and disabled for now")), and this predated the HasInstance change.

I'm fine with the revert if it's breaking a lot of tests, I can try to spend some time looking at why it's triggering multiple scale-ups and why that seems more common with this change.

@Choraden
Copy link
Copy Markdown
Contributor Author

AFAIR shouldn't trigger additional scale-ups during processing scale-up is a different thing. That is related to k8s events about scale up that are emitted by CA in an unreliable way. The behavior introduced by #9319 looks new to me.

@jackfrancis
Copy link
Copy Markdown
Contributor

/test pull-autoscaling-e2e-gci-gce-ca-test

@jackfrancis
Copy link
Copy Markdown
Contributor

/test pull-autoscaling-e2e-gci-gce-ca-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cluster-autoscaler area/provider/gce cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants