fix: preserve application failover timer across ResourceUnknown transitions by Ady0333 · Pull Request #7432 · karmada-io/karmada

Ady0333 · 2026-04-25T12:11:37Z

What type of PR is this?
/kind bug

What this PR does / why we need it:
Fixes a critical bug where application failover is permanently suppressed when workloads oscillate between Unhealthy and Unknown states.

Which issue(s) this PR fixes:
Fixes #7431

Root cause:
distinguishUnhealthyClustersWithOthers was treating ResourceUnknown the same as ResourceHealthy, causing the failover timer to be deleted on every Unknown transition (agent restarts, network blips, etc.).

Changes:

Changed line 119 to only treat ResourceHealthy as healthy
Unknown clusters now preserve their unhealthy timestamps
Updated test to reflect correct behavior

Does this PR introduce a user-facing change?

Fixed a critical bug where application failover would never trigger if workloads oscillated between Unhealthy and Unknown states due to infrastructure events.

…th state Fixes a critical bug where clusters with oscillating health states (Unhealthy→Unknown→Unhealthy) never accumulated enough unhealthy time to trigger application failover. The Unknown state was incorrectly treated as healthy, causing deleteIrrelevantClusters to erase the toleration timer. Now only confirmed Healthy observations reset the timer. Signed-off-by: Ady0333 <adityashinde1525@gmail.com>

gemini-code-assist · 2026-04-25T12:11:44Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where application failover mechanisms were being incorrectly suppressed. By refining how the system distinguishes between healthy and unknown cluster states, the fix ensures that transient infrastructure events do not reset or clear essential failover timers, thereby improving the reliability of workload failover during periods of intermittent connectivity.

Highlights

Logic Correction: Modified the cluster health classification logic to exclude ResourceUnknown from the healthy cluster set.
Failover Preservation: Ensured that unhealthy timestamps are preserved during ResourceUnknown transitions to prevent suppression of failover timers.
Test Update: Updated unit tests to validate that clusters in an unknown state are no longer incorrectly categorized as healthy.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

karmada-bot · 2026-04-25T12:11:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign charlesqq for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/controllers/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Ady0333 · 2026-04-25T12:12:32Z

Hello @XiShanYongYe-Chang !!! Please review this PR and let me know if any changes are required...

gemini-code-assist

Code Review

This pull request modifies the logic in the distinguishUnhealthyClustersWithOthers function within pkg/controllers/applicationfailover/common.go to exclude clusters with a ResourceUnknown health status from the others slice. The associated unit test in pkg/controllers/applicationfailover/common_test.go has been updated to align with this change. I have no feedback to provide.

Copilot

Pull request overview

Fixes application failover timer suppression by ensuring ResourceUnknown transitions do not reset the unhealthy timestamp, allowing toleration time to accumulate across transient “unknown” periods.

Changes:

Updated unhealthy/other cluster classification to treat only ResourceHealthy as “others” (no longer includes ResourceUnknown).
Adjusted unit test expectations for the ResourceUnknown case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
pkg/controllers/applicationfailover/common.go	Stops classifying `ResourceUnknown` as “healthy/other” so failover timestamps aren’t cleared on Unknown transitions.
pkg/controllers/applicationfailover/common_test.go	Updates `distinguishUnhealthyClustersWithOthers` test expectation for the all-Unknown scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov-commenter · 2026-04-25T12:28:06Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.90%. Comparing base (923d98e) to head (b50e143).
⚠️ Report is 58 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7432      +/-   ##
==========================================
- Coverage   42.15%   41.90%   -0.26%     
==========================================
  Files         876      879       +3     
  Lines       53645    54285     +640     
==========================================
+ Hits        22614    22746     +132     
- Misses      29334    29820     +486     
- Partials     1697     1719      +22

Flag	Coverage Δ
unittests	`41.90% <100.00%> (-0.26%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Rename 'others' to 'healthyClusters' in distinguishUnhealthyClustersWithOthers and update all call sites for clarity. Add regression test to verify that unhealthy timestamps are preserved when health transitions to Unknown. Signed-off-by: Ady0333 <adityashinde1525@gmail.com>

Copilot AI review requested due to automatic review settings April 25, 2026 12:11

karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 25, 2026

karmada-bot requested review from jwcesign and zach593 April 25, 2026 12:11

karmada-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 25, 2026

Copilot started reviewing on behalf of Ady0333 April 25, 2026 12:12 View session

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Comment thread pkg/controllers/applicationfailover/common_test.go

Comment thread pkg/controllers/applicationfailover/common.go

karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve application failover timer across ResourceUnknown transitions#7432

fix: preserve application failover timer across ResourceUnknown transitions#7432
Ady0333 wants to merge 2 commits into
karmada-io:masterfrom
Ady0333:fix/application-failover-unknown-resets-timer

Ady0333 commented Apr 25, 2026

Uh oh!

gemini-code-assist Bot commented Apr 25, 2026

Uh oh!

karmada-bot commented Apr 25, 2026

Uh oh!

Ady0333 commented Apr 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Ady0333 commented Apr 25, 2026

Uh oh!

gemini-code-assist Bot commented Apr 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

karmada-bot commented Apr 25, 2026

Uh oh!

Ady0333 commented Apr 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Apr 25, 2026 •

edited

Loading