fix: ignore NotFound when cleaning Work policy metadata#7464
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug where Work objects could become stuck in a Terminating state during deletion. By allowing the controller to ignore 'NotFound' errors for resources that have already been removed from the member cluster, the cleanup process can now complete successfully even when the underlying resources are missing. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR aims to unblock deletion of Work resources when preserveResourcesOnDeletion=true and the propagated member-cluster resource has already been removed. In the execution controller, it changes cleanup to ignore NotFound from the member-cluster lookup so the Work finalizer can be removed instead of leaving the object stuck in Terminating.
Changes:
- Update
cleanupPolicyClaimMetadatato skip cleanup when the member-cluster resource lookup returnsNotFound. - Add a reconciliation test case covering the preserved-resource deletion path when the target resource is already absent.
- Add a dedicated test helper that builds a controller with an empty member-cluster cache/client state.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
pkg/controllers/execution/execution_controller.go |
Adjusts preserved-resource deletion cleanup to continue when the member-cluster object lookup returns NotFound. |
pkg/controllers/execution/execution_controller_test.go |
Adds coverage for the absent-member-resource deletion scenario and introduces a helper for constructing that test setup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request updates the execution_controller to gracefully handle scenarios where a resource is already absent from a member cluster during the cleanupPolicyClaimMetadata process, preventing unnecessary error returns when a resource is not found in the cache. It also introduces a new test case and a helper function, newControllerWithNoClusterResource, to simulate and verify this behavior. The review feedback identifies an improvement opportunity in the test code to avoid calling methods like WithKind or WithResource on package-level GroupVersion variables, suggesting explicit schema.GroupVersion construction instead to ensure better compatibility and maintainability.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7464 +/- ##
==========================================
+ Coverage 41.90% 41.91% +0.01%
==========================================
Files 879 879
Lines 54326 54357 +31
==========================================
+ Hits 22766 22786 +20
- Misses 29833 29846 +13
+ Partials 1727 1725 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Denyme24 <namanraj24@outlook.como> Co-authored-by: Cursor <cursoragent@cursor.com>
65ca81b to
84691e8
Compare
|
hey @XiShanYongYe-Chang @chaunceyjiang @zhzhuang-zju, would appreciate your reviews.PTAL |
|
hey @XiShanYongYe-Chang @chaunceyjiang @zhzhuang-zju, a gentle ping to review it, at your convenience. |
|
Hi @Denyme24 As we discussed in the issue, there is currently not enough information to support us in continuing to push this change. For reference, what is your use case for Karmada? |
What type of PR is this?
/kind bug
What this PR does / why we need it:
when a Work is deleted with “preserve resources on deletion” enabled, this PR makes the controller handle NotFound more safely: if the cache lookup or the update step returns NotFound, it does a live GET to confirm whether the member‑cluster resource is actually gone. if it is already gone, that manifest is skipped; if it still exists, the update is retried. this lets cleanup finish reliably and the Work’s finalizer be removed, so the Work no longer gets stuck in the Terminating state when the target resource was deleted beforehand
Which issue(s) this PR fixes:
Fixes #7463
Details
``` === RUN TestExecutionController_Reconcile === RUN TestExecutionController_Reconcile/work_dispatching_is_suspended,_no_error,_no_apply === RUN TestExecutionController_Reconcile/work_dispatching_is_suspended,_adds_false_dispatching_condition === RUN TestExecutionController_Reconcile/work_dispatching_is_suspended,_adds_event_message === RUN TestExecutionController_Reconcile/work_dispatching_is_suspended,_overwrites_existing_dispatching_condition === RUN TestExecutionController_Reconcile/suspend_work_with_deletion_timestamp_is_deleted I0502 18:28:12.611393 16266 objectwatcher.go:256] Deleted the resource(kind=Pod, default/test) on cluster(cluster). === RUN TestExecutionController_Reconcile/PreserveResourcesOnDeletion=true,_deletion_timestamp_set,_does_not_delete_resource I0502 18:28:12.719944 16266 objectwatcher.go:199] Updated the resource(kind=Pod, default/test) on cluster(cluster) but the cluster object was not changed. === RUN TestExecutionController_Reconcile/PreserveResourcesOnDeletion=false,_deletion_timestamp_set,_deletes_resource I0502 18:28:12.842585 16266 objectwatcher.go:256] Deleted the resource(kind=Pod, default/test) on cluster(cluster). === RUN TestExecutionController_Reconcile/PreserveResourcesOnDeletion_unset,_deletion_timestamp_set,_deletes_resource I0502 18:28:12.960405 16266 objectwatcher.go:256] Deleted the resource(kind=Pod, default/test) on cluster(cluster). === RUN TestExecutionController_Reconcile/PreserveResourcesOnDeletion=true,_resource_already_absent_from_member_cluster,_finalizer_removed_without_error --- PASS: TestExecutionController_Reconcile (1.26s) --- PASS: TestExecutionController_Reconcile/work_dispatching_is_suspended,_no_error,_no_apply (0.33s) --- PASS: TestExecutionController_Reconcile/work_dispatching_is_suspended,_adds_false_dispatching_condition (0.11s) --- PASS: TestExecutionController_Reconcile/work_dispatching_is_suspended,_adds_event_message (0.11s) --- PASS: TestExecutionController_Reconcile/work_dispatching_is_suspended,_overwrites_existing_dispatching_condition (0.12s) --- PASS: TestExecutionController_Reconcile/suspend_work_with_deletion_timestamp_is_deleted (0.11s) --- PASS: TestExecutionController_Reconcile/PreserveResourcesOnDeletion=true,_deletion_timestamp_set,_does_not_delete_resource (0.11s) --- PASS: TestExecutionController_Reconcile/PreserveResourcesOnDeletion=false,_deletion_timestamp_set,_deletes_resource (0.12s) --- PASS: TestExecutionController_Reconcile/PreserveResourcesOnDeletion_unset,_deletion_timestamp_set,_deletes_resource (0.12s) --- PASS: TestExecutionController_Reconcile/PreserveResourcesOnDeletion=true,_resource_already_absent_from_member_cluster,_finalizer_removed_without_error (0.12s) PASS ok github.com/karmada-io/karmada/pkg/controllers/execution 1.407s ```Does this PR introduce a user-facing change?: