Skip to content

Implement fast predicate index for cluster-autoscaler simulator#9461

Draft
x13n wants to merge 1 commit intokubernetes:masterfrom
x13n:master
Draft

Implement fast predicate index for cluster-autoscaler simulator#9461
x13n wants to merge 1 commit intokubernetes:masterfrom
x13n:master

Conversation

@x13n
Copy link
Copy Markdown
Member

@x13n x13n commented Apr 8, 2026

This change introduces a fast predicate index and specialized fast predicates in the cluster snapshot simulator. This significantly optimizes pod scheduling simulations by avoiding redundant predicate evaluations and utilizing efficient indexing for node filtering, particularly for pod affinity/anti-affinity and topology spread constraints.

Key improvements:

  • Introduced FastPredicateIndex to track pod counts by labels and topology domains.
  • Implemented FastPredicates to perform preliminary, optimized checks before falling back to the full scheduler plugin runner.
  • Integrated the index with Basic and Delta snapshot stores.
  • Added the 'fast-predicates-enabled' flag to control the feature.

Performance Impact (BenchmarkRunFiltersUntilPassingNode): The benchmarks show a significant performance improvement (6x to 11x) across different parallelism levels, with a substantial reduction in memory allocations.

Parallelism Before (ns/op) After (ns/op) Improvement
1 3,910,850 630,607 6.2x
2 3,324,178 399,312 8.3x
4 2,834,906 285,971 9.9x
8 2,856,542 256,432 11.1x
16 3,026,452 278,924 10.8x

Memory Statistics (Parallelism 1):

  • Before: 1,508,666 B/op, 7045 allocs/op
  • After: 539,304 B/op, 3312 allocs/op

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Major part of this PR is AI generated, needs careful review.

Does this PR introduce a user-facing change?

[Perf] A new fast-predicates-enabled flag can be used to replace slow scheduler predicate checking of anti-affinity and topology spreading with a faster CA-specific alternative.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/hold for testing

This change introduces a fast predicate index and specialized fast predicates
in the cluster snapshot simulator. This significantly optimizes pod scheduling
simulations by avoiding redundant predicate evaluations and utilizing efficient
indexing for node filtering, particularly for pod affinity/anti-affinity and
topology spread constraints.

Key improvements:
- Introduced FastPredicateIndex to track pod counts by labels and topology domains.
- Implemented FastPredicates to perform preliminary, optimized checks before
  falling back to the full scheduler plugin runner.
- Integrated the index with Basic and Delta snapshot stores.
- Added the 'fast-predicates-enabled' flag to control the feature.

Performance Impact (BenchmarkRunFiltersUntilPassingNode):
The benchmarks show a significant performance improvement (6x to 11x) across
different parallelism levels, with a substantial reduction in memory
allocations.

Parallelism | Before (ns/op) | After (ns/op) | Improvement
------------|----------------|---------------|------------
1           | 3,910,850      | 630,607       | 6.2x
2           | 3,324,178      | 399,312       | 8.3x
4           | 2,834,906      | 285,971       | 9.9x
8           | 2,856,542      | 256,432       | 11.1x
16          | 3,026,452      | 278,924       | 10.8x

Memory Statistics (Parallelism 1):
- Before: 1,508,666 B/op, 7045 allocs/op
- After:    539,304 B/op, 3312 allocs/op
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area area/cluster-autoscaler labels Apr 8, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 8, 2026
@x13n x13n added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 8, 2026
}
}

if affinity.PodAntiAffinity != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At which point do we check if existing pods have anti-affinity against incoming pod? Incoming pod may have no AA in the spec, but we still need to check if it violates constraints of the existing pods.

if p.fastPredicatesEnabled {
if err := p.fastCheckPredicates(pod, nodeInfo, fastState); err != nil {
// Fast check failed, so this Node won't work.
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that error means "cannot be scheduled on this node"? This is quite confusing, I'd prefer it to return a boolean and use errors for actual errors


workqueue.ParallelizeUntil(ctx, p.parallelism, len(nodeInfosList), checkNode)
chunkSize := chunkSizeFor(len(nodeInfosList), p.parallelism)
workqueue.ParallelizeUntil(ctx, p.parallelism, len(nodeInfosList), checkNode, workqueue.WithChunkSize(chunkSize))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should also disable the inter-pod affinity plugin?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that documents how this works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants