Optimize ResourceBinding to Work synchronization throughput for large-scale Pod distribution scenarios (10000+ Pods).

  **What would you like to be added**:

  Optimize ResourceBinding to Work synchronization throughput for large-scale Pod distribution scenarios (10000+ Pods).

  **Why is this needed**:

  ### Problem Description

  In large-scale resource distribution scenarios (e.g., distributing 10000+ Pods to multiple member clusters), we observed a significant bottleneck in the ResourceBinding to Work synchronization path.

  **Observed data during stress testing:**
  etcd object counts:
  - Pod:             12649
  - ResourceBinding: 12645  ✅ Almost caught up with Pods
  - Work:            6646   ❌ ~6000 lagging behind ResourceBindings

  ### Bottleneck Analysis

  We analyzed the complete data path and identified where the latency occurs:

  | Stage | Component | Latency | Status |
  |-------|-----------|---------|--------|
  | Pod → ResourceBinding | resource-detector | ~ms | ✅ Normal |
  | ResourceBinding → RB.spec.clusters | karmada-scheduler | ~ms | ✅ Normal |
  | **ResourceBinding → Work** | **binding-controller** | **seconds~minutes** | ⚠️ **Bottleneck** |

  ### Root Cause Analysis

  The binding-controller has several performance issues in the RB → Work path:

  1. **Synchronous Work Creation**
     - Each ResourceBinding reconcile blocks on Work creation
     - No decoupling between scheduling decision and persistence
     - Redundant reconciles when works are being created

  2. **Inefficient API Call Pattern**
     - Uses `controllerutil.CreateOrUpdate` which always does Get before Create
     - For new Work objects: 2 API calls (Get + Create) instead of 1
     - No fast-path to skip unchanged Work updates

  3. **Unnecessary Orphan Work Checks**
     - `removeOrphanWorks()` is called on **every reconcile**
     - Each check triggers a List API call via `GetWorksByBindingID()`
     - With 12645 RBs, this means 12645+ unnecessary List operations

  4. **Sequential Multi-Cluster Work Creation**
     - When distributing to N clusters, Work objects are created sequentially
     - For dual-cluster scenario: 2 sequential API calls instead of parallel

  5. **Excessive Event Recording**
     - Records 2 Events per successful sync (binding + workload)
     - In high-throughput scenarios, this creates significant API load

  ### API Call Analysis (Dual-Cluster Distribution)

  **Before optimization (per ResourceBinding):**
```
  ├── GetWorksByBindingID (List)     = 1 call
  ├── FetchResourceTemplate (Get)    = 1 call
  ├── Cluster A: CreateOrUpdateWork
  │   ├── Get (NotFound)             = 1 call
  │   └── Create                     = 1 call
  ├── Cluster B: CreateOrUpdateWork (sequential!)
  │   ├── Get (NotFound)             = 1 call
  │   └── Create                     = 1 call
  ├── Event(binding)                 = 1 call
  └── Event(workload)                = 1 call
  Total: 8 API calls (sequential)
```

  **After optimization (per ResourceBinding):**
```
  ├── FetchResourceTemplate (Get)    = 1 call
  ├── Parallel:
  │   ├── Cluster A: Create          = 1 call
  │   └── Cluster B: Create          = 1 call
  └── Log (no API call)
  Total: 3 API calls (2 parallel)
```

  ### Configuration Bottleneck

  We also discovered that the default RateLimiter configuration is too conservative:

  Default: --rate-limiter-qps=10

  Mathematical calculation for 6160 ResourceBindings:
  Theoretical minimum processing time = 6160 / 10 = 616 seconds ≈ 10 minutes!

  ### Expected Improvements

  | Metric | Before | After | Improvement |
  |--------|--------|-------|-------------|
  | New Work API calls | 2 per Work | 1 per Work | **50%** |
  | Orphan check frequency | Every reconcile | Only on cluster change | **90%+** |
  | Multi-cluster Work creation | Sequential | Parallel | **Nx** |
  | Event recording | 2 per success | 0 per success | **100%** |
  | Total API calls per RB (dual-cluster) | ~8 (sequential) | ~3 (2 parallel) | **60%+** |
  | **Expected throughput** | ~200 Work/s | ~1000+ Work/s | **5-10x** |

  ### Proposed Solution

  1. **AsyncWorkCreator** - Decouple Work creation from reconcile loop with async workers
  2. **Assume Cache** - Skip redundant reconciles for in-flight work creation (similar to kube-scheduler)
  3. **Create-First Pattern** - Try Create before Get+Update
  4. **Precise Orphan Detection** - Use hash annotation to skip unchanged cluster checks
  5. **Parallel Work Creation** - Create Works for multiple clusters concurrently
  6. **AsyncBinder for Scheduler** - Async workers for RB/CRB patch operations

  All optimizations should be behind feature flags for backward compatibility.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize ResourceBinding to Work synchronization throughput for large-scale Pod distribution scenarios (10000+ Pods). #7062

Problem Description

Bottleneck Analysis

Root Cause Analysis

API Call Analysis (Dual-Cluster Distribution)

Configuration Bottleneck

Expected Improvements

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stage	Component	Latency	Status
Pod → ResourceBinding	resource-detector	~ms	✅ Normal
ResourceBinding → RB.spec.clusters	karmada-scheduler	~ms	✅ Normal
ResourceBinding → Work	binding-controller	seconds~minutes	⚠️ Bottleneck

Metric	Before	After	Improvement
New Work API calls	2 per Work	1 per Work	50%
Orphan check frequency	Every reconcile	Only on cluster change	90%+
Multi-cluster Work creation	Sequential	Parallel	Nx
Event recording	2 per success	0 per success	100%
Total API calls per RB (dual-cluster)	~8 (sequential)	~3 (2 parallel)	60%+
Expected throughput	~200 Work/s	~1000+ Work/s	5-10x

Optimize ResourceBinding to Work synchronization throughput for large-scale Pod distribution scenarios (10000+ Pods). #7062

Description

Problem Description

Bottleneck Analysis

Root Cause Analysis

API Call Analysis (Dual-Cluster Distribution)

Configuration Bottleneck

Expected Improvements

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions