Implement fast predicate index for cluster-autoscaler simulator#9461
Implement fast predicate index for cluster-autoscaler simulator#9461x13n wants to merge 1 commit intokubernetes:masterfrom
Conversation
This change introduces a fast predicate index and specialized fast predicates in the cluster snapshot simulator. This significantly optimizes pod scheduling simulations by avoiding redundant predicate evaluations and utilizing efficient indexing for node filtering, particularly for pod affinity/anti-affinity and topology spread constraints. Key improvements: - Introduced FastPredicateIndex to track pod counts by labels and topology domains. - Implemented FastPredicates to perform preliminary, optimized checks before falling back to the full scheduler plugin runner. - Integrated the index with Basic and Delta snapshot stores. - Added the 'fast-predicates-enabled' flag to control the feature. Performance Impact (BenchmarkRunFiltersUntilPassingNode): The benchmarks show a significant performance improvement (6x to 11x) across different parallelism levels, with a substantial reduction in memory allocations. Parallelism | Before (ns/op) | After (ns/op) | Improvement ------------|----------------|---------------|------------ 1 | 3,910,850 | 630,607 | 6.2x 2 | 3,324,178 | 399,312 | 8.3x 4 | 2,834,906 | 285,971 | 9.9x 8 | 2,856,542 | 256,432 | 11.1x 16 | 3,026,452 | 278,924 | 10.8x Memory Statistics (Parallelism 1): - Before: 1,508,666 B/op, 7045 allocs/op - After: 539,304 B/op, 3312 allocs/op
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: x13n The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| } | ||
| } | ||
|
|
||
| if affinity.PodAntiAffinity != nil { |
There was a problem hiding this comment.
At which point do we check if existing pods have anti-affinity against incoming pod? Incoming pod may have no AA in the spec, but we still need to check if it violates constraints of the existing pods.
| if p.fastPredicatesEnabled { | ||
| if err := p.fastCheckPredicates(pod, nodeInfo, fastState); err != nil { | ||
| // Fast check failed, so this Node won't work. | ||
| return |
There was a problem hiding this comment.
Do I understand correctly that error means "cannot be scheduled on this node"? This is quite confusing, I'd prefer it to return a boolean and use errors for actual errors
|
|
||
| workqueue.ParallelizeUntil(ctx, p.parallelism, len(nodeInfosList), checkNode) | ||
| chunkSize := chunkSizeFor(len(nodeInfosList), p.parallelism) | ||
| workqueue.ParallelizeUntil(ctx, p.parallelism, len(nodeInfosList), checkNode, workqueue.WithChunkSize(chunkSize)) |
There was a problem hiding this comment.
I guess we should also disable the inter-pod affinity plugin?
There was a problem hiding this comment.
Please add a comment that documents how this works
This change introduces a fast predicate index and specialized fast predicates in the cluster snapshot simulator. This significantly optimizes pod scheduling simulations by avoiding redundant predicate evaluations and utilizing efficient indexing for node filtering, particularly for pod affinity/anti-affinity and topology spread constraints.
Key improvements:
Performance Impact (BenchmarkRunFiltersUntilPassingNode): The benchmarks show a significant performance improvement (6x to 11x) across different parallelism levels, with a substantial reduction in memory allocations.
Memory Statistics (Parallelism 1):
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Major part of this PR is AI generated, needs careful review.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/hold for testing