Skip to content

feat(execution): only modify member cluster objects in execution controller#7552

Open
zach593 wants to merge 1 commit into
karmada-io:masterfrom
ctripcloud:execution
Open

feat(execution): only modify member cluster objects in execution controller#7552
zach593 wants to merge 1 commit into
karmada-io:masterfrom
ctripcloud:execution

Conversation

@zach593
Copy link
Copy Markdown
Contributor

@zach593 zach593 commented May 25, 2026

What type of PR is this?

/kind cleanup
/kind feature

What this PR does / why we need it:

This PR refines the responsibility boundary between the execution controller and the work-status controller. Previously, the work-status controller was responsible for both reflecting member cluster resource status back to Work objects AND detecting/reconciling drifts in member cluster resources (e.g., updating or recreating objects that had diverged from the desired state). This mixing of concerns made the codebase harder to reason about.

After this change, the execution controller takes over the write path — it directly watches member cluster resources via informers, detects meaningful drifts by comparing actual objects against version records (ignoring irrelevant fields like status and managedFields), and reconciles them. The work-status controller is simplified to focus solely on the read path — reflecting status from member cluster resources back to Work objects.

The result is a cleaner separation of concerns: the execution controller drives member cluster objects toward the desired state, while the work-status controller observes and reports their actual status.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Local benchmark results show that when restarting the controller-manager against an already-synced cluster, the time to drain the queue is virtually unchanged compared to before.

Test configuration:

image
  • 20,000 Deployments, each with replicas split 1:1 across 2 member clusters.
  • All controllers set to 100 workers, REST QPS capped at 500.
  • GOMAXPROCS=8.
image

The queue drain takes approximately 3 minutes.

The informer startup for both 2 controllers no longer blocks during member cluster informer sync. This avoids the problem where a handful of resources being established could stall all workers and delay distribution or status collection for other resource types or member clusters. While the informer is still being established, the controller requeues after 60 seconds and retries.

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: The execution controller now handles all member cluster object modifications, while the work-status controller is scoped to status collection only.

Copilot AI review requested due to automatic review settings May 25, 2026 17:11
@karmada-bot karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label May 25, 2026
@karmada-bot karmada-bot requested a review from lonelyCZ May 25, 2026 17:11
@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 25, 2026
@karmada-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign whitewindmills for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 25, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the architectural boundaries between the execution and work-status controllers in the Karmada controller manager. By centralizing drift detection and resource reconciliation within the execution controller, the system achieves a cleaner separation of concerns. The work-status controller is now dedicated solely to status reflection. These changes also include performance improvements by ensuring that informer synchronization no longer blocks controller operations, leading to more robust and predictable cluster management.

Highlights

  • Controller Responsibility Refactoring: Moved all write-path operations, including drift detection and reconciliation of member cluster resources, from the work-status-controller to the execution-controller.
  • Informer Management Optimization: Decoupled informer startup to prevent blocking, ensuring that the controller-manager initializes more efficiently without stalling worker queues.
  • Drift Detection Logic: Updated the execution-controller to directly watch member cluster resources and compare them against version records, ignoring irrelevant fields like status and managedFields.
  • Simplified Status Collection: Scoped the work-status-controller to focus exclusively on the read path, specifically reflecting status from member cluster resources back to Work objects.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors member-cluster informer handling for Work execution/status controllers by extracting common helper utilities, injecting informer managers for better testability, and simplifying event-driven reconciliation.

Changes:

  • Added GetGVRsFromWork and EnsureInformerHandlersReady to centralize manifest→GVR extraction and informer/handler readiness checks.
  • Updated Execution/WorkStatus controllers to ensure informers are ready (non-blocking, requeue when not yet synced) and to support member-cluster event driven reconciles via a channel watch.
  • Simplified ObjectWatcher version tracking (removed lifted helpers) and updated constructors to accept an injected informer manager.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/util/work.go Adds GetGVRsFromWork helper to extract distinct GVRs from Work manifests.
pkg/util/work_test.go Adds unit tests for GetGVRsFromWork.
pkg/util/helper/cache.go Adds EnsureInformerHandlersReady to register handlers/start informers and report sync state.
pkg/util/helper/cache_test.go Adds tests covering informer/handler readiness behavior.
pkg/util/objectwatcher/objectwatcher.go Changes ObjectWatcher API to expose GetVersionRecord; injects informer manager; stores resourceVersion directly.
pkg/util/lifted/objectwatcher.go Removes lifted kubefed-based version/update helpers.
pkg/util/lifted/objectwatcher_test.go Removes tests for deleted lifted helpers.
pkg/controllers/status/work_status_controller.go Uses new helper utilities, memoizes handler with sync.Once, and changes NotFound handling during status sync.
pkg/controllers/status/work_status_controller_test.go Updates/factors tests for new informer behavior and buildResourceInformers flow.
pkg/controllers/execution/execution_controller.go Adds channel-based watch from member cluster events and requeue-on-informer-not-synced flow.
pkg/controllers/execution/execution_controller_test.go Adds coverage for new event handling/mapping and handler memoization.
cmd/controller-manager/app/controllermanager.go Wires new execution controller dependencies and updated ObjectWatcher constructor.
cmd/agent/app/agent.go Wires new execution controller dependencies and updated ObjectWatcher constructor.
Files not reviewed (1)
  • pkg/util/lifted/doc.go: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/controllers/execution/execution_controller.go
Comment thread pkg/controllers/status/work_status_controller.go
Comment thread pkg/controllers/execution/execution_controller.go
Comment thread pkg/util/work.go Outdated
Comment thread pkg/util/work_test.go
Comment thread pkg/util/work_test.go
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the execution and work status controllers to streamline informer registration and event handling. It introduces a shared helper function EnsureInformerHandlersReady to consolidate dynamic informer setup and handler registration, and cleans up deprecated/unused code from the lifted package. The code review feedback identifies a critical blocking issue in enqueueWorkload that could degrade informer performance, points out style guide violations regarding excessive function parameters in EnsureInformerHandlersReady and NewObjectWatcher, and suggests defensive nil-checks to prevent potential runtime panics.

Comment thread pkg/controllers/execution/execution_controller.go
Comment thread pkg/util/helper/cache.go
Comment thread pkg/util/objectwatcher/objectwatcher.go
Comment thread pkg/util/helper/cache.go
Comment thread pkg/util/work.go
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 25, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 68.15920% with 64 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.25%. Comparing base (8eb0d30) to head (f288e06).
⚠️ Report is 36 commits behind head on master.

Files with missing lines Patch % Lines
pkg/controllers/execution/execution_controller.go 65.48% 36 Missing and 3 partials ⚠️
cmd/controller-manager/app/controllermanager.go 0.00% 10 Missing ⚠️
cmd/agent/app/agent.go 0.00% 8 Missing ⚠️
pkg/util/objectwatcher/objectwatcher.go 0.00% 5 Missing ⚠️
pkg/util/helper/cache.go 92.30% 1 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7552      +/-   ##
==========================================
+ Coverage   42.12%   42.25%   +0.12%     
==========================================
  Files         879      878       -1     
  Lines       54649    54743      +94     
==========================================
+ Hits        23022    23130     +108     
+ Misses      29898    29871      -27     
- Partials     1729     1742      +13     
Flag Coverage Δ
unittests 42.25% <68.15%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…roller

Signed-off-by: zach593 <zach_li@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants