-
Notifications
You must be signed in to change notification settings - Fork 1k
perf: optimize RB to Work throughput for large-scale Pod distribution #7063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
perf: optimize RB to Work throughput for large-scale Pod distribution #7063
Conversation
|
Welcome @Kevinz857! It looks like this is your first PR to karmada-io/karmada 🎉 |
Summary of ChangesHello @Kevinz857, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses significant performance bottlenecks identified in the ResourceBinding to Work synchronization path, particularly in large-scale Kubernetes environments with 10,000+ Pods. The core problem was a lag in Work object creation, where the system struggled to keep up with the rate of ResourceBinding updates. The changes introduce several key optimizations, primarily by shifting from synchronous to asynchronous processing for Work creation and binding operations, and by reducing unnecessary API calls. These improvements aim to dramatically increase throughput and reduce API server load, making the system more robust and scalable for high-density deployments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive set of performance optimizations for large-scale pod distribution, which is a significant and valuable improvement. The introduction of asynchronous workers and the "Assume Cache" pattern for both work creation and scheduler binding is well-designed. The optimizations in CreateOrUpdateWork and for orphan work detection are also great additions.
My review focuses on a few areas to enhance robustness and maintainability:
- Context Propagation: In the new asynchronous workers, the context is not consistently propagated to downstream API calls, which could lead to resource leaks during shutdown.
- Style Guide Adherence: One new function has more parameters than recommended by the repository's style guide.
- Maintainability: A hardcoded value could be replaced with a defined constant to improve code clarity and consistency.
Overall, this is an excellent pull request that will substantially improve Karmada's performance at scale. The suggested changes aim to further strengthen the implementation.
| options = append(options, ctrlutil.WithPreserveResourcesOnDeletion(*task.PreserveResourcesOnDeletion)) | ||
| } | ||
|
|
||
| err := ctrlutil.CreateOrUpdateWork(context.Background(), a.client, task.WorkMeta, task.Workload, options...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using context.Background() can lead to goroutines leaking during shutdown, as the API call won't be cancelled when the worker's context is done. The context from the worker function should be passed down to createWork and used here.
To fix this, you should:
- Change
createWorksignature tofunc (a *AsyncWorkCreator) createWork(ctx context.Context, task *WorkTask). - Update the call in
workerfunction toa.createWork(ctx, task). - Use the passed
ctxin this call.
| err := ctrlutil.CreateOrUpdateWork(context.Background(), a.client, task.WorkMeta, task.Workload, options...) | |
| err := ctrlutil.CreateOrUpdateWork(ctx, a.client, task.WorkMeta, task.Workload, options...) |
pkg/scheduler/binder/binder.go
Outdated
| } | ||
|
|
||
| // doBind performs the actual binding operation | ||
| func (b *AsyncBinder) doBind(result *BindResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doBind function and its callees (bindResourceBinding, bindClusterResourceBinding, and the patch functions) use context.TODO(). This can lead to goroutines leaking during shutdown, as the API calls won't be cancelled when the worker's context is done.
Please propagate the context from bindWorker down to all API calls. For example:
- Change
doBindsignature todoBind(ctx context.Context, result *BindResult). - Update the call in
bindWorker:b.doBind(ctx, result). - Propagate the context to
bindResourceBindingandpatchResourceBindingSpec. - Use the context in the
Patchcall:b.karmadaClient.WorkV1alpha2().ResourceBindings(rb.Namespace).Patch(ctx, ...).
This should be applied to all patch calls within this file.
| func (b *AsyncBinder) doBind(result *BindResult) { | |
| func (b *AsyncBinder) doBind(ctx context.Context, result *BindResult) { |
| rbRequeueChan = make(chan string, 10000) | ||
| crbRequeueChan = make(chan string, 10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requeue channels are initialized with a hardcoded size of 10000. To improve maintainability and consistency, please use the binding.DefaultAsyncWorkQueueSize constant, which is also used for the async work queue itself.
| rbRequeueChan = make(chan string, 10000) | |
| crbRequeueChan = make(chan string, 10000) | |
| rbRequeueChan = make(chan string, binding.DefaultAsyncWorkQueueSize) | |
| crbRequeueChan = make(chan string, binding.DefaultAsyncWorkQueueSize) |
pkg/controllers/binding/common.go
Outdated
| func prepareWorkTask( | ||
| resourceInterpreter resourceinterpreter.ResourceInterpreter, | ||
| clonedWorkload *unstructured.Unstructured, | ||
| overrideManager overridemanager.OverrideManager, | ||
| binding metav1.Object, | ||
| scope apiextensionsv1.ResourceScope, | ||
| bindingSpec workv1alpha2.ResourceBindingSpec, | ||
| targetCluster workv1alpha2.TargetCluster, | ||
| clusterIndex int, | ||
| jobCompletions []workv1alpha2.TargetCluster, | ||
| totalClusters int, | ||
| ) (workTask, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has 10 parameters, which exceeds the recommended limit of 5 as per the repository style guide. Consider encapsulating the parameters into a struct to improve readability and maintainability.
For example, you could define a struct like this:
type prepareWorkTaskArgs struct {
resourceInterpreter resourceinterpreter.ResourceInterpreter
clonedWorkload *unstructured.Unstructured
overrideManager overridemanager.OverrideManager
binding metav1.Object
scope apiextensionsv1.ResourceScope
bindingSpec workv1alpha2.ResourceBindingSpec
targetCluster workv1alpha2.TargetCluster
clusterIndex int
jobCompletions []workv1alpha2.TargetCluster
totalClusters int
}And then change the function signature to func prepareWorkTask(args prepareWorkTaskArgs) (workTask, error).
References
- A function should generally not have more than 5 parameters. If it exceeds this, consider refactoring the function or encapsulating the parameters into a struct. (link)
eb19164 to
7000671
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7063 +/- ##
==========================================
- Coverage 46.54% 46.21% -0.33%
==========================================
Files 700 702 +2
Lines 48084 48888 +804
==========================================
+ Hits 22382 22596 +214
- Misses 24018 24589 +571
- Partials 1684 1703 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e5ad8b6 to
05a2989
Compare
This commit introduces multiple performance optimizations for the ResourceBinding to Work synchronization path, targeting scenarios with 10000+ Pods distribution. Key optimizations: 1. AsyncWorkCreator for Binding Controller - Decouples Work creation from reconcile loop using 64 async workers - Implements Assume Cache pattern (similar to kube-scheduler) - Adds failure retry via requeue callback mechanism - Periodic cleanup of stale cache entries (every 5 min) 2. Parallel Work preparation and execution - Parallelizes DeepCopy and ApplyOverridePolicies across clusters - Concurrent Work creation for multi-cluster scenarios 3. CreateOrUpdateWork optimization - Implements Create-First pattern (try Create before Get+Update) - Adds fast-path comparison to skip unchanged Work updates - Reduces API calls by 30-50% in update scenarios 4. Precise orphan Work detection - Uses TargetClustersHashAnnotation to track cluster changes - Skips orphan check when clusters haven't changed - Expected 90%+ reduction in List API calls 5. AsyncBinder for Scheduler - 32 async workers for RB/CRB patch operations - Decouples scheduling decisions from persistence New configuration options: --enable-async-work-creation=true --async-work-workers=64 --enable-async-bind=true --async-bind-workers=32 Performance improvement: - New Work API calls: 2 -> 1 per Work (50% reduction) - Orphan check: Every reconcile -> Only on cluster change (90%+ reduction) - Multi-cluster Work creation: Sequential -> Parallel (Nx speedup) - Expected throughput: ~200 Work/s -> ~1000+ Work/s (5-10x improvement) Signed-off-by: Kevinz857 <[email protected]>
05a2989 to
d954ec1
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR introduces multiple performance optimizations for the ResourceBinding to Work synchronization path, targeting large-scale Pod distribution scenarios (10000+ Pods).
Background
During stress testing with 12000+ Pods distributed to dual clusters, we observed:
The bottleneck was identified in the binding-controller's RB → Work synchronization path.
Key Optimizations
1. AsyncWorkCreator for Binding Controller
File:
pkg/controllers/binding/async_work_creator.go(new)2. Parallel Work Preparation and Execution
File:
pkg/controllers/binding/common.go3. CreateOrUpdateWork Optimization
File:
pkg/controllers/ctrlutil/work.go4. Precise Orphan Work Detection
Files:
pkg/controllers/binding/binding_controller.go,cluster_resource_binding_controller.goTargetClustersHashAnnotationto track cluster changes5. AsyncBinder for Scheduler
File:
pkg/scheduler/binder/binder.go(new)New Configuration Options
--enable-async-work-creation--async-work-workers--enable-async-bind--async-bind-workersPerformance Results
API Call Reduction (per ResourceBinding, dual-cluster)
Throughput Improvement
With optimized code + proper configuration tuning:
--rate-limiter-qps--concurrent-resourcebinding-syncs--concurrent-work-syncsProduction Deployment Recommendation
For high-throughput scenarios (10000+ Pods), we recommend splitting controllers into separate deployments:
karmada-binding-controllerkarmada-execution-controllerkarmada-misc-controllerWhich issue(s) this PR fixes:
Fixes #7062
Special notes for your reviewer:
IsWorkload()usage from #45a4940be and #579d2265aDoes this PR introduce a user-facing change?: