Add performance benchmark for the CA RunOnce control loop by Choraden · Pull Request #9237 · kubernetes/autoscaler

Choraden · 2026-02-16T09:45:07Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

While working on #9022, it became clear that a standardized benchmark is necessary to quantify performance gains and prevent potential regressions in the core logic.

Leveraging the ongoing refactor of the autoscaler building logic in #9099, this PR introduces an initial draft of a benchmark specifically for the RunOnce function. This provides a controlled environment to measure the impact of architectural changes on the main execution loop.

Initial benchmark version at #9199 was difficult to stabilize and reason about. So we decided to simplify it to only one RunOnce call, simulating "cold start" of the CA.

Which issue(s) this PR fixes:

Relates to #9022

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Update NewTestProcessors to use DynamicResourceAllocationEnabled and CSINodeAwareSchedulingEnabled from AutoscalingOptions instead of hardcoded values. This allows tests to properly configure custom resource processing.

Updated MustCreateManager in the integration test package to accept testing.TB instead of *testing.T. This allows the helper to be used within both standard tests and performance benchmarks (which use *testing.B). This change is a prerequisite for introducing performance benchmarking for the RunOnce control loop.

This commit adds a new benchmarking suite in core/bench to evaluate the performance of the primary Cluster Autoscaler control loop (RunOnce). These benchmarks simulate large-scale cluster operations using a mock Kubernetes API and cloud provider, allowing for comparative analysis and detection of performance regressions.

Introduced a -profile-cpu flag to the RunOnce benchmarking suite. When specified, the benchmark will capture a CPU profile during the first execution of the RunOnce loop and write it to the provided file path.

…cibility

Disable Garbage Collection during RunOnce benchmarks to ensure stable and reproducible results. This prioritizes consistency over absolute performance metrics, allowing for a generic way to calculate performance differences between patches and providing a clean CPU profile for the RunOnce loop.

Introduce a no-op event recorder in RunOnce benchmarks to prevent event dropping and potential performance side-effects. This change also extends AutoscalerBuilder to support injecting custom AutoscalingKubeClients, allowing for better control over the environment in performance-sensitive tests.

Introduce fastScaleUpCloudProvider and fastScaleUpNodeGroup in benchmarks to avoid the overhead of simulating real node creation in the fake cloud provider. This significantly reduces noise in CPU profiles when benchmarking the core autoscaling logic, as it avoids unnecessary node object management in the fake provider. Added NoOpIncreaseSize to the fake NodeGroup to support this faster scale-up simulation.

This change introduces fastTaintingKubeClient which uses reactors to track and inject ToBeDeleted taints on nodes during the benchmark. This allows the scale-down logic to correctly identify nodes that have been marked for deletion by the autoscaler without relying on standard fake client persistence for these taints. This simply removes fake client from cpu profile.

k8s-ci-robot · 2026-02-16T09:45:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Choraden
Once this PR has been reviewed and has the lgtm label, please assign bigdarkclown for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-02-16T09:45:17Z

Hi @Choraden. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Choraden · 2026-02-16T09:47:58Z

/uncc aleksandra-malinowska vadasambar
/cc @x13n @pmendelski
/assign @towca @mtrqq

Keeping as draft until #9099 is merged.

k8s-ci-robot · 2026-02-16T09:48:04Z

@Choraden: GitHub didn't allow me to request PR reviews from the following users: pmendelski.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/uncc aleksandra-malinowska vadasambar
/cc @x13n @pmendelski
/assign @towca @mtrqq

Keeping as draft until #9099 is merged.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Choraden · 2026-02-16T09:52:39Z

Sharing results:

goos: linux
goarch: amd64
pkg: k8s.io/autoscaler/cluster-autoscaler/core/bench
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkRunOnceScaleUp
BenchmarkRunOnceScaleUp-8              1        1553364377 ns/op        762335776 B/op   8386725 allocs/op
BenchmarkRunOnceScaleUp-8              1        1585365778 ns/op        761671552 B/op   8379717 allocs/op
BenchmarkRunOnceScaleUp-8              1        1679504327 ns/op        765269968 B/op   8417926 allocs/op
BenchmarkRunOnceScaleUp-8              1        1747838191 ns/op        768367568 B/op   8450747 allocs/op
BenchmarkRunOnceScaleUp-8              1        1807256894 ns/op        762046144 B/op   8383723 allocs/op
BenchmarkRunOnceScaleUp-8              1        1894646431 ns/op        763635504 B/op   8400150 allocs/op
BenchmarkRunOnceScaleUp-8              1        2029809964 ns/op        763436992 B/op   8397823 allocs/op
BenchmarkRunOnceScaleUp-8              1        1647624633 ns/op        763700192 B/op   8401286 allocs/op
BenchmarkRunOnceScaleUp-8              1        1779511222 ns/op        763037232 B/op   8393860 allocs/op
BenchmarkRunOnceScaleUp-8              1        1600256004 ns/op        761429888 B/op   8376822 allocs/op
BenchmarkRunOnceScaleUp-8              1        1640637519 ns/op        765901664 B/op   8424453 allocs/op
BenchmarkRunOnceScaleUp-8              1        1718404806 ns/op        760736208 B/op   8369882 allocs/op
BenchmarkRunOnceScaleUp-8              1        1602827127 ns/op        761467056 B/op   8377073 allocs/op
BenchmarkRunOnceScaleUp-8              1        1621891375 ns/op        766729728 B/op   8433466 allocs/op
BenchmarkRunOnceScaleUp-8              1        1669716609 ns/op        764976512 B/op   8415161 allocs/op
BenchmarkRunOnceScaleUp-8              1        1692255431 ns/op        762403168 B/op   8387177 allocs/op
BenchmarkRunOnceScaleUp-8              1        1556904841 ns/op        764092048 B/op   8405116 allocs/op
BenchmarkRunOnceScaleUp-8              1        1754510304 ns/op        765413872 B/op   8419820 allocs/op
BenchmarkRunOnceScaleUp-8              1        1618392504 ns/op        764394560 B/op   8408599 allocs/op
BenchmarkRunOnceScaleUp-8              1        1727217439 ns/op        763309312 B/op   8397060 allocs/op
BenchmarkRunOnceScaleDown
BenchmarkRunOnceScaleDown-8            1        1929963694 ns/op        1007442848 B/op 10141579 allocs/op
BenchmarkRunOnceScaleDown-8            1        2037815272 ns/op        1007731760 B/op 10144702 allocs/op
BenchmarkRunOnceScaleDown-8            1        1955088108 ns/op        1008686160 B/op 10154822 allocs/op
BenchmarkRunOnceScaleDown-8            1        2007510803 ns/op        1007923488 B/op 10147363 allocs/op
BenchmarkRunOnceScaleDown-8            1        2071893121 ns/op        1007467424 B/op 10142303 allocs/op
BenchmarkRunOnceScaleDown-8            1        1989004058 ns/op        1007225888 B/op 10139238 allocs/op
BenchmarkRunOnceScaleDown-8            1        2158107429 ns/op        1007981264 B/op 10147202 allocs/op
BenchmarkRunOnceScaleDown-8            1        2175420048 ns/op        1007327520 B/op 10140612 allocs/op
BenchmarkRunOnceScaleDown-8            1        2256805385 ns/op        1006870784 B/op 10135439 allocs/op
BenchmarkRunOnceScaleDown-8            1        2101632209 ns/op        1008600128 B/op 10154222 allocs/op
BenchmarkRunOnceScaleDown-8            1        2121799323 ns/op        1005974256 B/op 10125445 allocs/op
BenchmarkRunOnceScaleDown-8            1        2370849015 ns/op        1006922272 B/op 10138672 allocs/op
BenchmarkRunOnceScaleDown-8            1        2243496561 ns/op        1007922880 B/op 10147037 allocs/op
BenchmarkRunOnceScaleDown-8            1        2117320600 ns/op        1007739056 B/op 10144560 allocs/op
BenchmarkRunOnceScaleDown-8            1        2239399297 ns/op        1007212624 B/op 10139170 allocs/op
BenchmarkRunOnceScaleDown-8            1        2125659572 ns/op        1007527936 B/op 10142351 allocs/op
BenchmarkRunOnceScaleDown-8            1        2091775344 ns/op        1007627408 B/op 10143524 allocs/op
BenchmarkRunOnceScaleDown-8            1        2129201106 ns/op        1007008608 B/op 10136800 allocs/op
BenchmarkRunOnceScaleDown-8            1        2048429965 ns/op        1006513392 B/op 10131373 allocs/op
BenchmarkRunOnceScaleDown-8            1        2462875021 ns/op        1008418272 B/op 10152107 allocs/op
---
                   │ master.txt │
                   │   sec/op   │
RunOnceScaleUp-8     1.675 ± 4%
RunOnceScaleDown-8   2.120 ± 3%
geomean              1.884

                   │  master.txt  │
                   │     B/op     │
RunOnceScaleUp-8     728.2Mi ± 0%
RunOnceScaleDown-8   960.8Mi ± 0%
geomean              836.4Mi

                   │ master.txt  │
                   │  allocs/op  │
RunOnceScaleUp-8     8.399M ± 0%
RunOnceScaleDown-8   10.14M ± 0%
geomean              9.230M

Scale Up:

Scale Down:

GaetanoMar96 and others added 13 commits February 13, 2026 08:32

Refactor main to allow dependency injection

befc0f1

Allow synctest usage passing context.

bd568c7

Add a full cycle CA test.

9e5f11b

Respect DRA and CSI flags in NewTestProcessors

9f102c1

Update NewTestProcessors to use DynamicResourceAllocationEnabled and CSINodeAwareSchedulingEnabled from AutoscalingOptions instead of hardcoded values. This allows tests to properly configure custom resource processing.

Add CPU profiling support to RunOnce benchmarks

f8f15ea

Introduced a -profile-cpu flag to the RunOnce benchmarking suite. When specified, the benchmark will capture a CPU profile during the first execution of the RunOnce loop and write it to the provided file path.

Disable predicate parallelism in benchmarks for stability and reprodu…

e8d9b71

…cibility

Silence benchmark logs to avoid output noise

f038f5f

k8s-ci-robot added do-not-merge/needs-area needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 16, 2026

k8s-ci-robot requested review from aleksandra-malinowska and vadasambar February 16, 2026 09:45

k8s-ci-robot added area/cluster-autoscaler size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/needs-area labels Feb 16, 2026

Choraden mentioned this pull request Feb 16, 2026

Draft: Add performance benchmark for the CA RunOnce control loop #9199

Closed

k8s-ci-robot assigned mtrqq and towca Feb 16, 2026

k8s-ci-robot requested review from x13n and removed request for aleksandra-malinowska and vadasambar February 16, 2026 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance benchmark for the CA RunOnce control loop#9237

Add performance benchmark for the CA RunOnce control loop#9237
Choraden wants to merge 13 commits intokubernetes:masterfrom
Choraden:run_once_bench_v2

Choraden commented Feb 16, 2026

Uh oh!

k8s-ci-robot commented Feb 16, 2026

Uh oh!

k8s-ci-robot commented Feb 16, 2026

Uh oh!

Choraden commented Feb 16, 2026

Uh oh!

k8s-ci-robot commented Feb 16, 2026

Uh oh!

Choraden commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Choraden commented Feb 16, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Feb 16, 2026

Uh oh!

k8s-ci-robot commented Feb 16, 2026

Uh oh!

Choraden commented Feb 16, 2026

Uh oh!

k8s-ci-robot commented Feb 16, 2026

Uh oh!

Choraden commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants