Skip to content

fix: ignore NotFound when cleaning Work policy metadata#7464

Open
Denyme24 wants to merge 1 commit into
karmada-io:masterfrom
Denyme24:fix/work-cleanup-notfound-finalizer
Open

fix: ignore NotFound when cleaning Work policy metadata#7464
Denyme24 wants to merge 1 commit into
karmada-io:masterfrom
Denyme24:fix/work-cleanup-notfound-finalizer

Conversation

@Denyme24
Copy link
Copy Markdown
Contributor

@Denyme24 Denyme24 commented May 2, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:

when a Work is deleted with “preserve resources on deletion” enabled, this PR makes the controller handle NotFound more safely: if the cache lookup or the update step returns NotFound, it does a live GET to confirm whether the member‑cluster resource is actually gone. if it is already gone, that manifest is skipped; if it still exists, the update is retried. this lets cleanup finish reliably and the Work’s finalizer be removed, so the Work no longer gets stuck in the Terminating state when the target resource was deleted beforehand

Which issue(s) this PR fixes:

Fixes #7463

Details ``` === RUN TestExecutionController_Reconcile === RUN TestExecutionController_Reconcile/work_dispatching_is_suspended,_no_error,_no_apply === RUN TestExecutionController_Reconcile/work_dispatching_is_suspended,_adds_false_dispatching_condition === RUN TestExecutionController_Reconcile/work_dispatching_is_suspended,_adds_event_message === RUN TestExecutionController_Reconcile/work_dispatching_is_suspended,_overwrites_existing_dispatching_condition === RUN TestExecutionController_Reconcile/suspend_work_with_deletion_timestamp_is_deleted I0502 18:28:12.611393 16266 objectwatcher.go:256] Deleted the resource(kind=Pod, default/test) on cluster(cluster). === RUN TestExecutionController_Reconcile/PreserveResourcesOnDeletion=true,_deletion_timestamp_set,_does_not_delete_resource I0502 18:28:12.719944 16266 objectwatcher.go:199] Updated the resource(kind=Pod, default/test) on cluster(cluster) but the cluster object was not changed. === RUN TestExecutionController_Reconcile/PreserveResourcesOnDeletion=false,_deletion_timestamp_set,_deletes_resource I0502 18:28:12.842585 16266 objectwatcher.go:256] Deleted the resource(kind=Pod, default/test) on cluster(cluster). === RUN TestExecutionController_Reconcile/PreserveResourcesOnDeletion_unset,_deletion_timestamp_set,_deletes_resource I0502 18:28:12.960405 16266 objectwatcher.go:256] Deleted the resource(kind=Pod, default/test) on cluster(cluster). === RUN TestExecutionController_Reconcile/PreserveResourcesOnDeletion=true,_resource_already_absent_from_member_cluster,_finalizer_removed_without_error --- PASS: TestExecutionController_Reconcile (1.26s) --- PASS: TestExecutionController_Reconcile/work_dispatching_is_suspended,_no_error,_no_apply (0.33s) --- PASS: TestExecutionController_Reconcile/work_dispatching_is_suspended,_adds_false_dispatching_condition (0.11s) --- PASS: TestExecutionController_Reconcile/work_dispatching_is_suspended,_adds_event_message (0.11s) --- PASS: TestExecutionController_Reconcile/work_dispatching_is_suspended,_overwrites_existing_dispatching_condition (0.12s) --- PASS: TestExecutionController_Reconcile/suspend_work_with_deletion_timestamp_is_deleted (0.11s) --- PASS: TestExecutionController_Reconcile/PreserveResourcesOnDeletion=true,_deletion_timestamp_set,_does_not_delete_resource (0.11s) --- PASS: TestExecutionController_Reconcile/PreserveResourcesOnDeletion=false,_deletion_timestamp_set,_deletes_resource (0.12s) --- PASS: TestExecutionController_Reconcile/PreserveResourcesOnDeletion_unset,_deletion_timestamp_set,_deletes_resource (0.12s) --- PASS: TestExecutionController_Reconcile/PreserveResourcesOnDeletion=true,_resource_already_absent_from_member_cluster,_finalizer_removed_without_error (0.12s) PASS ok github.com/karmada-io/karmada/pkg/controllers/execution 1.407s ```

Does this PR introduce a user-facing change?:

NONE

Copilot AI review requested due to automatic review settings May 2, 2026 18:15
@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label May 2, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where Work objects could become stuck in a Terminating state during deletion. By allowing the controller to ignore 'NotFound' errors for resources that have already been removed from the member cluster, the cleanup process can now complete successfully even when the underlying resources are missing.

Highlights

  • Error Handling Improvement: Updated the execution controller to gracefully handle 'NotFound' errors when cleaning up policy claim metadata, preventing the Work controller from getting stuck in a Terminating state.
  • Test Coverage: Added a new test case to verify that finalizers are correctly removed when the target resource is already absent from the member cluster.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@karmada-bot karmada-bot requested review from mrlihanbo and pigletfly May 2, 2026 18:15
@karmada-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign garrybest for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to unblock deletion of Work resources when preserveResourcesOnDeletion=true and the propagated member-cluster resource has already been removed. In the execution controller, it changes cleanup to ignore NotFound from the member-cluster lookup so the Work finalizer can be removed instead of leaving the object stuck in Terminating.

Changes:

  • Update cleanupPolicyClaimMetadata to skip cleanup when the member-cluster resource lookup returns NotFound.
  • Add a reconciliation test case covering the preserved-resource deletion path when the target resource is already absent.
  • Add a dedicated test helper that builds a controller with an empty member-cluster cache/client state.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/controllers/execution/execution_controller.go Adjusts preserved-resource deletion cleanup to continue when the member-cluster object lookup returns NotFound.
pkg/controllers/execution/execution_controller_test.go Adds coverage for the absent-member-resource deletion scenario and introduces a helper for constructing that test setup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/controllers/execution/execution_controller.go Outdated
Comment thread pkg/controllers/execution/execution_controller.go Outdated
Comment thread pkg/controllers/execution/execution_controller_test.go Outdated
Comment thread pkg/controllers/execution/execution_controller_test.go
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the execution_controller to gracefully handle scenarios where a resource is already absent from a member cluster during the cleanupPolicyClaimMetadata process, preventing unnecessary error returns when a resource is not found in the cache. It also introduces a new test case and a helper function, newControllerWithNoClusterResource, to simulate and verify this behavior. The review feedback identifies an improvement opportunity in the test code to avoid calling methods like WithKind or WithResource on package-level GroupVersion variables, suggesting explicit schema.GroupVersion construction instead to ensure better compatibility and maintainability.

Comment thread pkg/controllers/execution/execution_controller_test.go
Comment thread pkg/controllers/execution/execution_controller_test.go
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 2, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 33.33333% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.91%. Comparing base (9d87acb) to head (84691e8).

Files with missing lines Patch % Lines
pkg/controllers/execution/execution_controller.go 33.33% 20 Missing and 2 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7464      +/-   ##
==========================================
+ Coverage   41.90%   41.91%   +0.01%     
==========================================
  Files         879      879              
  Lines       54326    54357      +31     
==========================================
+ Hits        22766    22786      +20     
- Misses      29833    29846      +13     
+ Partials     1727     1725       -2     
Flag Coverage Δ
unittests 41.91% <33.33%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 2, 2026
Signed-off-by: Denyme24 <namanraj24@outlook.como>
Co-authored-by: Cursor <cursoragent@cursor.com>
@Denyme24 Denyme24 force-pushed the fix/work-cleanup-notfound-finalizer branch from 65ca81b to 84691e8 Compare May 2, 2026 18:57
@Denyme24
Copy link
Copy Markdown
Contributor Author

Denyme24 commented May 2, 2026

hey @XiShanYongYe-Chang @chaunceyjiang @zhzhuang-zju, would appreciate your reviews.PTAL
thanks

@Denyme24
Copy link
Copy Markdown
Contributor Author

hey @XiShanYongYe-Chang @chaunceyjiang @zhzhuang-zju, a gentle ping to review it, at your convenience.
thanks

@XiShanYongYe-Chang
Copy link
Copy Markdown
Member

Hi @Denyme24 As we discussed in the issue, there is currently not enough information to support us in continuing to push this change.

For reference, what is your use case for Karmada?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

work with “preserve resources” can get stuck deleting if the member resource is already gone

5 participants