feat(execution): only modify member cluster objects in execution controller#7552
feat(execution): only modify member cluster objects in execution controller#7552zach593 wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the architectural boundaries between the execution and work-status controllers in the Karmada controller manager. By centralizing drift detection and resource reconciliation within the execution controller, the system achieves a cleaner separation of concerns. The work-status controller is now dedicated solely to status reflection. These changes also include performance improvements by ensuring that informer synchronization no longer blocks controller operations, leading to more robust and predictable cluster management. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Refactors member-cluster informer handling for Work execution/status controllers by extracting common helper utilities, injecting informer managers for better testability, and simplifying event-driven reconciliation.
Changes:
- Added
GetGVRsFromWorkandEnsureInformerHandlersReadyto centralize manifest→GVR extraction and informer/handler readiness checks. - Updated Execution/WorkStatus controllers to ensure informers are ready (non-blocking, requeue when not yet synced) and to support member-cluster event driven reconciles via a channel watch.
- Simplified
ObjectWatcherversion tracking (removed lifted helpers) and updated constructors to accept an injected informer manager.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/util/work.go | Adds GetGVRsFromWork helper to extract distinct GVRs from Work manifests. |
| pkg/util/work_test.go | Adds unit tests for GetGVRsFromWork. |
| pkg/util/helper/cache.go | Adds EnsureInformerHandlersReady to register handlers/start informers and report sync state. |
| pkg/util/helper/cache_test.go | Adds tests covering informer/handler readiness behavior. |
| pkg/util/objectwatcher/objectwatcher.go | Changes ObjectWatcher API to expose GetVersionRecord; injects informer manager; stores resourceVersion directly. |
| pkg/util/lifted/objectwatcher.go | Removes lifted kubefed-based version/update helpers. |
| pkg/util/lifted/objectwatcher_test.go | Removes tests for deleted lifted helpers. |
| pkg/controllers/status/work_status_controller.go | Uses new helper utilities, memoizes handler with sync.Once, and changes NotFound handling during status sync. |
| pkg/controllers/status/work_status_controller_test.go | Updates/factors tests for new informer behavior and buildResourceInformers flow. |
| pkg/controllers/execution/execution_controller.go | Adds channel-based watch from member cluster events and requeue-on-informer-not-synced flow. |
| pkg/controllers/execution/execution_controller_test.go | Adds coverage for new event handling/mapping and handler memoization. |
| cmd/controller-manager/app/controllermanager.go | Wires new execution controller dependencies and updated ObjectWatcher constructor. |
| cmd/agent/app/agent.go | Wires new execution controller dependencies and updated ObjectWatcher constructor. |
Files not reviewed (1)
- pkg/util/lifted/doc.go: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request refactors the execution and work status controllers to streamline informer registration and event handling. It introduces a shared helper function EnsureInformerHandlersReady to consolidate dynamic informer setup and handler registration, and cleans up deprecated/unused code from the lifted package. The code review feedback identifies a critical blocking issue in enqueueWorkload that could degrade informer performance, points out style guide violations regarding excessive function parameters in EnsureInformerHandlersReady and NewObjectWatcher, and suggests defensive nil-checks to prevent potential runtime panics.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7552 +/- ##
==========================================
+ Coverage 42.12% 42.25% +0.12%
==========================================
Files 879 878 -1
Lines 54649 54743 +94
==========================================
+ Hits 23022 23130 +108
+ Misses 29898 29871 -27
- Partials 1729 1742 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ef47b7b to
e7dd25f
Compare
…roller Signed-off-by: zach593 <zach_li@outlook.com>
What type of PR is this?
/kind cleanup
/kind feature
What this PR does / why we need it:
This PR refines the responsibility boundary between the execution controller and the work-status controller. Previously, the work-status controller was responsible for both reflecting member cluster resource status back to Work objects AND detecting/reconciling drifts in member cluster resources (e.g., updating or recreating objects that had diverged from the desired state). This mixing of concerns made the codebase harder to reason about.
After this change, the execution controller takes over the write path — it directly watches member cluster resources via informers, detects meaningful drifts by comparing actual objects against version records (ignoring irrelevant fields like status and managedFields), and reconciles them. The work-status controller is simplified to focus solely on the read path — reflecting status from member cluster resources back to Work objects.
The result is a cleaner separation of concerns: the execution controller drives member cluster objects toward the desired state, while the work-status controller observes and reports their actual status.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Local benchmark results show that when restarting the controller-manager against an already-synced cluster, the time to drain the queue is virtually unchanged compared to before.
Test configuration:
The queue drain takes approximately 3 minutes.
The informer startup for both 2 controllers no longer blocks during member cluster informer sync. This avoids the problem where a handful of resources being established could stall all workers and delay distribution or status collection for other resource types or member clusters. While the informer is still being established, the controller requeues after 60 seconds and retries.
Does this PR introduce a user-facing change?: