Skip to content

Conversation

@Kevinz857
Copy link

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR introduces multiple performance optimizations for the ResourceBinding to Work synchronization path, targeting large-scale Pod distribution scenarios (10000+ Pods).

Background

During stress testing with 12000+ Pods distributed to dual clusters, we observed:

  • Pods: 12649
  • ResourceBindings: 12645 (caught up)
  • Works: 6646 (~6000 lagging behind)

The bottleneck was identified in the binding-controller's RB → Work synchronization path.

Key Optimizations

1. AsyncWorkCreator for Binding Controller

File: pkg/controllers/binding/async_work_creator.go (new)

  • Decouples Work creation from reconcile loop using configurable async workers (default: 64)
  • Implements Assume Cache pattern (borrowed from kube-scheduler's proven design)
  • Adds failure retry via requeue callback mechanism
  • Periodic cleanup of stale cache entries (every 5 min)

2. Parallel Work Preparation and Execution

File: pkg/controllers/binding/common.go

  • Parallelizes DeepCopy and ApplyOverridePolicies across target clusters
  • Concurrent Work creation for multi-cluster scenarios
  • Single-cluster fast path to avoid goroutine overhead

3. CreateOrUpdateWork Optimization

File: pkg/controllers/ctrlutil/work.go

  • Implements Create-First pattern (try Create before Get+Update)
  • Adds fast-path comparison to skip unchanged Work updates
  • Reduces API calls by 30-50% in update scenarios

4. Precise Orphan Work Detection

Files: pkg/controllers/binding/binding_controller.go, cluster_resource_binding_controller.go

  • Uses TargetClustersHashAnnotation to track cluster changes
  • Skips orphan check when target clusters haven't changed
  • Expected 90%+ reduction in List API calls

5. AsyncBinder for Scheduler

File: pkg/scheduler/binder/binder.go (new)

  • 32 async workers for RB/CRB patch operations
  • Decouples scheduling decisions from persistence

New Configuration Options

Flag Default Component Description
--enable-async-work-creation false controller-manager Enable async work creation
--async-work-workers 64 controller-manager Number of async workers
--enable-async-bind false scheduler Enable async binding
--async-bind-workers 32 scheduler Number of async bind workers

Performance Results

API Call Reduction (per ResourceBinding, dual-cluster)

Operation Before After Reduction
Orphan check (List) 1 0 (when clusters unchanged) 100%
Work Create 2 (Get+Create) × 2 1 (Create) × 2 50%
Events 2 0 100%
Total API calls 8 (sequential) 3 (2 parallel) 62%

Throughput Improvement

With optimized code + proper configuration tuning:

Configuration Before After
--rate-limiter-qps 10 1000+
--concurrent-resourcebinding-syncs 5 50+
--concurrent-work-syncs 5 50+
Metric Before After Improvement
Work creation throughput ~200 Work/s ~1000+ Work/s 5-10x
6000 Works creation time ~30 seconds ~6 seconds 5x

Production Deployment Recommendation

For high-throughput scenarios (10000+ Pods), we recommend splitting controllers into separate deployments:

Deployment Controllers Benefit
karmada-binding-controller binding, bindingStatus Dedicated resources for RB→Work
karmada-execution-controller execution Dedicated resources for Work→Member
karmada-misc-controller Other controllers Isolated from high-throughput path

Which issue(s) this PR fixes:

Fixes #7062

Special notes for your reviewer:

  • All new features are disabled by default for backward compatibility
  • The Assume Cache pattern is borrowed from kube-scheduler's proven design
  • Comprehensive error handling ensures no work creation is lost (automatic retry via requeue)
  • Cache cleanup prevents memory leaks from orphaned entries
  • Includes community bug fixes: IsWorkload() usage from #45a4940be and #579d2265a

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Added `--enable-async-work-creation` and `--async-work-workers` flags to enable asynchronous Work creation for improved throughput in large-scale Pod distribution scenarios (5-10x improvement).
`karmada-scheduler`: Added `--enable-async-bind` and `--async-bind-workers` flags to enable asynchronous binding for improved scheduling throughput.

@karmada-bot
Copy link
Contributor

Welcome @Kevinz857! It looks like this is your first PR to karmada-io/karmada 🎉

@karmada-bot karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 30, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @Kevinz857, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses significant performance bottlenecks identified in the ResourceBinding to Work synchronization path, particularly in large-scale Kubernetes environments with 10,000+ Pods. The core problem was a lag in Work object creation, where the system struggled to keep up with the rate of ResourceBinding updates. The changes introduce several key optimizations, primarily by shifting from synchronous to asynchronous processing for Work creation and binding operations, and by reducing unnecessary API calls. These improvements aim to dramatically increase throughput and reduce API server load, making the system more robust and scalable for high-density deployments.

Highlights

  • Asynchronous Work Creation for Binding Controller: Introduced a new AsyncWorkCreator with configurable async workers (default: 64) to decouple Work object creation from the reconcile loop. It implements an 'Assume Cache' pattern, similar to kube-scheduler, for optimistic updates and includes failure retry mechanisms via requeue channels. This significantly improves throughput for large-scale Pod distribution.
  • Parallel Work Preparation and Execution: Optimized the ensureWork function in common.go to parallelize DeepCopy and ApplyOverridePolicies across target clusters. It also enables concurrent Work creation for multi-cluster scenarios, with a fast-path for single-cluster deployments to avoid goroutine overhead.
  • Optimized CreateOrUpdateWork: The CreateOrUpdateWork utility now uses a 'Create-First' pattern, attempting to create a Work object directly before falling back to a Get+Update if it already exists. It also includes a fast-path comparison to skip updates if no changes are detected, reducing API calls by 30-50% in update scenarios.
  • Precise Orphan Work Detection: Implemented TargetClustersHashAnnotation on ResourceBindings and ClusterResourceBindings to track changes in target clusters. Orphan work checks are now skipped if the target clusters haven't changed, leading to an expected 90%+ reduction in List API calls for orphan detection.
  • Asynchronous Binder for Scheduler: A new AsyncBinder component has been added to the scheduler, utilizing 32 async workers for ResourceBinding/ClusterResourceBinding patch operations. This decouples scheduling decisions from the persistence of binding results, improving scheduling throughput.
  • New Configuration Options: New flags have been introduced: --enable-async-work-creation (default: false) and --async-work-workers (default: 64) for the controller-manager, and --enable-async-bind (default: false) and --async-bind-workers (default: 32) for the scheduler. These flags allow users to enable and configure the new asynchronous processing features.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of performance optimizations for large-scale pod distribution, which is a significant and valuable improvement. The introduction of asynchronous workers and the "Assume Cache" pattern for both work creation and scheduler binding is well-designed. The optimizations in CreateOrUpdateWork and for orphan work detection are also great additions.

My review focuses on a few areas to enhance robustness and maintainability:

  • Context Propagation: In the new asynchronous workers, the context is not consistently propagated to downstream API calls, which could lead to resource leaks during shutdown.
  • Style Guide Adherence: One new function has more parameters than recommended by the repository's style guide.
  • Maintainability: A hardcoded value could be replaced with a defined constant to improve code clarity and consistency.

Overall, this is an excellent pull request that will substantially improve Karmada's performance at scale. The suggested changes aim to further strengthen the implementation.

options = append(options, ctrlutil.WithPreserveResourcesOnDeletion(*task.PreserveResourcesOnDeletion))
}

err := ctrlutil.CreateOrUpdateWork(context.Background(), a.client, task.WorkMeta, task.Workload, options...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using context.Background() can lead to goroutines leaking during shutdown, as the API call won't be cancelled when the worker's context is done. The context from the worker function should be passed down to createWork and used here.

To fix this, you should:

  1. Change createWork signature to func (a *AsyncWorkCreator) createWork(ctx context.Context, task *WorkTask).
  2. Update the call in worker function to a.createWork(ctx, task).
  3. Use the passed ctx in this call.
Suggested change
err := ctrlutil.CreateOrUpdateWork(context.Background(), a.client, task.WorkMeta, task.Workload, options...)
err := ctrlutil.CreateOrUpdateWork(ctx, a.client, task.WorkMeta, task.Workload, options...)

}

// doBind performs the actual binding operation
func (b *AsyncBinder) doBind(result *BindResult) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The doBind function and its callees (bindResourceBinding, bindClusterResourceBinding, and the patch functions) use context.TODO(). This can lead to goroutines leaking during shutdown, as the API calls won't be cancelled when the worker's context is done.

Please propagate the context from bindWorker down to all API calls. For example:

  1. Change doBind signature to doBind(ctx context.Context, result *BindResult).
  2. Update the call in bindWorker: b.doBind(ctx, result).
  3. Propagate the context to bindResourceBinding and patchResourceBindingSpec.
  4. Use the context in the Patch call: b.karmadaClient.WorkV1alpha2().ResourceBindings(rb.Namespace).Patch(ctx, ...).

This should be applied to all patch calls within this file.

Suggested change
func (b *AsyncBinder) doBind(result *BindResult) {
func (b *AsyncBinder) doBind(ctx context.Context, result *BindResult) {

Comment on lines 389 to 390
rbRequeueChan = make(chan string, 10000)
crbRequeueChan = make(chan string, 10000)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The requeue channels are initialized with a hardcoded size of 10000. To improve maintainability and consistency, please use the binding.DefaultAsyncWorkQueueSize constant, which is also used for the async work queue itself.

Suggested change
rbRequeueChan = make(chan string, 10000)
crbRequeueChan = make(chan string, 10000)
rbRequeueChan = make(chan string, binding.DefaultAsyncWorkQueueSize)
crbRequeueChan = make(chan string, binding.DefaultAsyncWorkQueueSize)

Comment on lines 209 to 220
func prepareWorkTask(
resourceInterpreter resourceinterpreter.ResourceInterpreter,
clonedWorkload *unstructured.Unstructured,
overrideManager overridemanager.OverrideManager,
binding metav1.Object,
scope apiextensionsv1.ResourceScope,
bindingSpec workv1alpha2.ResourceBindingSpec,
targetCluster workv1alpha2.TargetCluster,
clusterIndex int,
jobCompletions []workv1alpha2.TargetCluster,
totalClusters int,
) (workTask, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function has 10 parameters, which exceeds the recommended limit of 5 as per the repository style guide. Consider encapsulating the parameters into a struct to improve readability and maintainability.

For example, you could define a struct like this:

type prepareWorkTaskArgs struct {
	resourceInterpreter resourceinterpreter.ResourceInterpreter
	clonedWorkload      *unstructured.Unstructured
	overrideManager     overridemanager.OverrideManager
	binding             metav1.Object
	scope               apiextensionsv1.ResourceScope
	bindingSpec         workv1alpha2.ResourceBindingSpec
	targetCluster       workv1alpha2.TargetCluster
	clusterIndex        int
	jobCompletions      []workv1alpha2.TargetCluster
	totalClusters       int
}

And then change the function signature to func prepareWorkTask(args prepareWorkTaskArgs) (workTask, error).

References
  1. A function should generally not have more than 5 parameters. If it exceeds this, consider refactoring the function or encapsulating the parameters into a struct. (link)

@Kevinz857 Kevinz857 force-pushed the feat/optimize-rb-to-work-throughput branch from eb19164 to 7000671 Compare December 30, 2025 07:01
@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 26.79956% with 661 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.21%. Comparing base (31e4756) to head (d954ec1).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
pkg/scheduler/binder/binder.go 26.52% 168 Missing and 1 partial ⚠️
pkg/controllers/binding/common.go 1.62% 121 Missing ⚠️
pkg/controllers/binding/binding_controller.go 12.37% 81 Missing and 4 partials ⚠️
...ers/binding/cluster_resource_binding_controller.go 13.18% 75 Missing and 4 partials ⚠️
pkg/scheduler/scheduler.go 5.19% 66 Missing and 7 partials ⚠️
pkg/controllers/binding/async_work_creator.go 61.43% 59 Missing ⚠️
cmd/controller-manager/app/controllermanager.go 0.00% 49 Missing ⚠️
pkg/controllers/ctrlutil/work.go 80.59% 7 Missing and 6 partials ⚠️
pkg/util/helper/binding.go 0.00% 8 Missing ⚠️
cmd/scheduler/app/scheduler.go 0.00% 3 Missing ⚠️
... and 1 more
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7063      +/-   ##
==========================================
- Coverage   46.54%   46.21%   -0.33%     
==========================================
  Files         700      702       +2     
  Lines       48084    48888     +804     
==========================================
+ Hits        22382    22596     +214     
- Misses      24018    24589     +571     
- Partials     1684     1703      +19     
Flag Coverage Δ
unittests 46.21% <26.79%> (-0.33%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Kevinz857 Kevinz857 force-pushed the feat/optimize-rb-to-work-throughput branch 4 times, most recently from e5ad8b6 to 05a2989 Compare December 30, 2025 10:45
This commit introduces multiple performance optimizations for the
ResourceBinding to Work synchronization path, targeting scenarios
with 10000+ Pods distribution.

Key optimizations:

1. AsyncWorkCreator for Binding Controller
   - Decouples Work creation from reconcile loop using 64 async workers
   - Implements Assume Cache pattern (similar to kube-scheduler)
   - Adds failure retry via requeue callback mechanism
   - Periodic cleanup of stale cache entries (every 5 min)

2. Parallel Work preparation and execution
   - Parallelizes DeepCopy and ApplyOverridePolicies across clusters
   - Concurrent Work creation for multi-cluster scenarios

3. CreateOrUpdateWork optimization
   - Implements Create-First pattern (try Create before Get+Update)
   - Adds fast-path comparison to skip unchanged Work updates
   - Reduces API calls by 30-50% in update scenarios

4. Precise orphan Work detection
   - Uses TargetClustersHashAnnotation to track cluster changes
   - Skips orphan check when clusters haven't changed
   - Expected 90%+ reduction in List API calls

5. AsyncBinder for Scheduler
   - 32 async workers for RB/CRB patch operations
   - Decouples scheduling decisions from persistence

New configuration options:
  --enable-async-work-creation=true
  --async-work-workers=64
  --enable-async-bind=true
  --async-bind-workers=32

Performance improvement:
  - New Work API calls: 2 -> 1 per Work (50% reduction)
  - Orphan check: Every reconcile -> Only on cluster change (90%+ reduction)
  - Multi-cluster Work creation: Sequential -> Parallel (Nx speedup)
  - Expected throughput: ~200 Work/s -> ~1000+ Work/s (5-10x improvement)

Signed-off-by: Kevinz857 <[email protected]>
@Kevinz857 Kevinz857 force-pushed the feat/optimize-rb-to-work-throughput branch from 05a2989 to d954ec1 Compare December 30, 2025 11:31
@karmada-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kevin-wangzefeng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize ResourceBinding to Work synchronization throughput for large-scale Pod distribution scenarios (10000+ Pods).

3 participants