Skip to content

Add performance benchmark for the CA RunOnce control loop#9237

Draft
Choraden wants to merge 13 commits intokubernetes:masterfrom
Choraden:run_once_bench_v2
Draft

Add performance benchmark for the CA RunOnce control loop#9237
Choraden wants to merge 13 commits intokubernetes:masterfrom
Choraden:run_once_bench_v2

Conversation

@Choraden
Copy link
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

While working on #9022, it became clear that a standardized benchmark is necessary to quantify performance gains and prevent potential regressions in the core logic.

Leveraging the ongoing refactor of the autoscaler building logic in #9099, this PR introduces an initial draft of a benchmark specifically for the RunOnce function. This provides a controlled environment to measure the impact of architectural changes on the main execution loop.

Initial benchmark version at #9199 was difficult to stabilize and reason about. So we decided to simplify it to only one RunOnce call, simulating "cold start" of the CA.

Which issue(s) this PR fixes:

Relates to #9022

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


GaetanoMar96 and others added 13 commits February 13, 2026 08:32
Update NewTestProcessors to use DynamicResourceAllocationEnabled and
CSINodeAwareSchedulingEnabled from AutoscalingOptions instead of hardcoded values.
This allows tests to properly configure custom resource processing.
Updated MustCreateManager in the integration test package to accept testing.TB
instead of *testing.T. This allows the helper to be used within both standard
tests and performance benchmarks (which use *testing.B).

This change is a prerequisite for introducing performance benchmarking for
the RunOnce control loop.
This commit adds a new benchmarking suite in core/bench to evaluate the
performance of the primary Cluster Autoscaler control loop (RunOnce). These
benchmarks simulate large-scale cluster operations using a mock Kubernetes API
and cloud provider, allowing for comparative analysis and detection of
performance regressions.
Introduced a -profile-cpu flag to the RunOnce benchmarking suite. When
specified, the benchmark will capture a CPU profile during the first execution
of the RunOnce loop and write it to the provided file path.
Disable Garbage Collection during RunOnce benchmarks to ensure stable and
reproducible results. This prioritizes consistency over absolute performance
metrics, allowing for a generic way to calculate performance differences
between patches and providing a clean CPU profile for the RunOnce loop.
Introduce a no-op event recorder in RunOnce benchmarks to prevent event
dropping and potential performance side-effects. This change also
extends AutoscalerBuilder to support injecting custom AutoscalingKubeClients,
allowing for better control over the environment in performance-sensitive tests.
Introduce fastScaleUpCloudProvider and fastScaleUpNodeGroup in benchmarks
to avoid the overhead of simulating real node creation in the fake cloud
provider. This significantly reduces noise in CPU profiles when benchmarking
the core autoscaling logic, as it avoids unnecessary node object management
in the fake provider.

Added NoOpIncreaseSize to the fake NodeGroup to support this faster
scale-up simulation.
This change introduces fastTaintingKubeClient which uses reactors to
track and inject ToBeDeleted taints on nodes during the benchmark. This
allows the scale-down logic to correctly identify nodes that have been
marked for deletion by the autoscaler without relying on standard fake
client persistence for these taints.
This simply removes fake client from cpu profile.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 16, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Choraden
Once this PR has been reviewed and has the lgtm label, please assign bigdarkclown for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 16, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @Choraden. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/needs-area labels Feb 16, 2026
@Choraden
Copy link
Contributor Author

/uncc aleksandra-malinowska vadasambar
/cc @x13n @pmendelski
/assign @towca @mtrqq

Keeping as draft until #9099 is merged.

@k8s-ci-robot k8s-ci-robot requested review from x13n and removed request for aleksandra-malinowska and vadasambar February 16, 2026 09:48
@k8s-ci-robot
Copy link
Contributor

@Choraden: GitHub didn't allow me to request PR reviews from the following users: pmendelski.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/uncc aleksandra-malinowska vadasambar
/cc @x13n @pmendelski
/assign @towca @mtrqq

Keeping as draft until #9099 is merged.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Choraden
Copy link
Contributor Author

Sharing results:

goos: linux
goarch: amd64
pkg: k8s.io/autoscaler/cluster-autoscaler/core/bench
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkRunOnceScaleUp
BenchmarkRunOnceScaleUp-8              1        1553364377 ns/op        762335776 B/op   8386725 allocs/op
BenchmarkRunOnceScaleUp-8              1        1585365778 ns/op        761671552 B/op   8379717 allocs/op
BenchmarkRunOnceScaleUp-8              1        1679504327 ns/op        765269968 B/op   8417926 allocs/op
BenchmarkRunOnceScaleUp-8              1        1747838191 ns/op        768367568 B/op   8450747 allocs/op
BenchmarkRunOnceScaleUp-8              1        1807256894 ns/op        762046144 B/op   8383723 allocs/op
BenchmarkRunOnceScaleUp-8              1        1894646431 ns/op        763635504 B/op   8400150 allocs/op
BenchmarkRunOnceScaleUp-8              1        2029809964 ns/op        763436992 B/op   8397823 allocs/op
BenchmarkRunOnceScaleUp-8              1        1647624633 ns/op        763700192 B/op   8401286 allocs/op
BenchmarkRunOnceScaleUp-8              1        1779511222 ns/op        763037232 B/op   8393860 allocs/op
BenchmarkRunOnceScaleUp-8              1        1600256004 ns/op        761429888 B/op   8376822 allocs/op
BenchmarkRunOnceScaleUp-8              1        1640637519 ns/op        765901664 B/op   8424453 allocs/op
BenchmarkRunOnceScaleUp-8              1        1718404806 ns/op        760736208 B/op   8369882 allocs/op
BenchmarkRunOnceScaleUp-8              1        1602827127 ns/op        761467056 B/op   8377073 allocs/op
BenchmarkRunOnceScaleUp-8              1        1621891375 ns/op        766729728 B/op   8433466 allocs/op
BenchmarkRunOnceScaleUp-8              1        1669716609 ns/op        764976512 B/op   8415161 allocs/op
BenchmarkRunOnceScaleUp-8              1        1692255431 ns/op        762403168 B/op   8387177 allocs/op
BenchmarkRunOnceScaleUp-8              1        1556904841 ns/op        764092048 B/op   8405116 allocs/op
BenchmarkRunOnceScaleUp-8              1        1754510304 ns/op        765413872 B/op   8419820 allocs/op
BenchmarkRunOnceScaleUp-8              1        1618392504 ns/op        764394560 B/op   8408599 allocs/op
BenchmarkRunOnceScaleUp-8              1        1727217439 ns/op        763309312 B/op   8397060 allocs/op
BenchmarkRunOnceScaleDown
BenchmarkRunOnceScaleDown-8            1        1929963694 ns/op        1007442848 B/op 10141579 allocs/op
BenchmarkRunOnceScaleDown-8            1        2037815272 ns/op        1007731760 B/op 10144702 allocs/op
BenchmarkRunOnceScaleDown-8            1        1955088108 ns/op        1008686160 B/op 10154822 allocs/op
BenchmarkRunOnceScaleDown-8            1        2007510803 ns/op        1007923488 B/op 10147363 allocs/op
BenchmarkRunOnceScaleDown-8            1        2071893121 ns/op        1007467424 B/op 10142303 allocs/op
BenchmarkRunOnceScaleDown-8            1        1989004058 ns/op        1007225888 B/op 10139238 allocs/op
BenchmarkRunOnceScaleDown-8            1        2158107429 ns/op        1007981264 B/op 10147202 allocs/op
BenchmarkRunOnceScaleDown-8            1        2175420048 ns/op        1007327520 B/op 10140612 allocs/op
BenchmarkRunOnceScaleDown-8            1        2256805385 ns/op        1006870784 B/op 10135439 allocs/op
BenchmarkRunOnceScaleDown-8            1        2101632209 ns/op        1008600128 B/op 10154222 allocs/op
BenchmarkRunOnceScaleDown-8            1        2121799323 ns/op        1005974256 B/op 10125445 allocs/op
BenchmarkRunOnceScaleDown-8            1        2370849015 ns/op        1006922272 B/op 10138672 allocs/op
BenchmarkRunOnceScaleDown-8            1        2243496561 ns/op        1007922880 B/op 10147037 allocs/op
BenchmarkRunOnceScaleDown-8            1        2117320600 ns/op        1007739056 B/op 10144560 allocs/op
BenchmarkRunOnceScaleDown-8            1        2239399297 ns/op        1007212624 B/op 10139170 allocs/op
BenchmarkRunOnceScaleDown-8            1        2125659572 ns/op        1007527936 B/op 10142351 allocs/op
BenchmarkRunOnceScaleDown-8            1        2091775344 ns/op        1007627408 B/op 10143524 allocs/op
BenchmarkRunOnceScaleDown-8            1        2129201106 ns/op        1007008608 B/op 10136800 allocs/op
BenchmarkRunOnceScaleDown-8            1        2048429965 ns/op        1006513392 B/op 10131373 allocs/op
BenchmarkRunOnceScaleDown-8            1        2462875021 ns/op        1008418272 B/op 10152107 allocs/op
---
                   │ master.txt │
                   │   sec/op   │
RunOnceScaleUp-8     1.675 ± 4%
RunOnceScaleDown-8   2.120 ± 3%
geomean              1.884

                   │  master.txt  │
                   │     B/op     │
RunOnceScaleUp-8     728.2Mi ± 0%
RunOnceScaleDown-8   960.8Mi ± 0%
geomean              836.4Mi

                   │ master.txt  │
                   │  allocs/op  │
RunOnceScaleUp-8     8.399M ± 0%
RunOnceScaleDown-8   10.14M ± 0%
geomean              9.230M

Scale Up:
image

Scale Down:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants