Fixed the issue that watch connect cannot reflect resource from recovered clusters immediately #7074

XiShanYongYe-Chang · 2026-01-04T09:33:26Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

ref #6963

Which issue(s) this PR fixes:

Fixes #6963

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-search`: Fixed the issue that watch connect cannot reflect resources from recovered clusters immediately.

gemini-code-assist · 2026-01-04T09:33:41Z

Summary of Changes

Hello @XiShanYongYe-Chang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where watch connections in a multi-cluster environment would not immediately reflect resources from newly recovered or added clusters. The solution involves enhancing the watch mechanism to actively invalidate existing watch connections when the cluster topology changes. This forces clients to re-establish their watches, thereby ensuring they receive up-to-date resource information from all available clusters, improving the system's responsiveness to cluster recovery events.

Highlights

Watch Invalidation Mechanism: Introduced a new invalidatableWatchMux type and a mechanism within MultiClusterCache to track and invalidate active watch connections, ensuring clients receive up-to-date resource information.
Cluster Recovery Handling: The UpdateCache method now detects when new or recovered clusters are added and triggers the invalidation of all active watches, forcing clients to reconnect and include resources from these clusters.
Watch Registration and Cleanup: Added registerWatch and unregisterWatch methods to MultiClusterCache to manage the lifecycle of active watch connections, ensuring proper tracking and cleanup when watches are stopped.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes an issue where watch connections did not reflect resources from recovered clusters immediately. This is achieved by tracking active watch connections and invalidating them when a cluster is added or recovered, forcing clients to reconnect and establish new watches that include the newly available cluster. The changes introduce an invalidatableWatchMux to handle watch invalidation and modify MultiClusterCache to manage these watchers. The overall approach is sound and correctly addresses the bug. I have a couple of suggestions for improvement regarding concurrency and style guide adherence.

pkg/search/proxy/store/multi_cluster_cache.go

pkg/search/proxy/store/util.go

XiShanYongYe-Chang · 2026-01-04T09:45:43Z

Hi @NickYadance, can you help take a review?

codecov-commenter · 2026-01-04T10:01:11Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 76.27119% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.59%. Comparing base (2a29397) to head (54b97ff).

Files with missing lines	Patch %	Lines
pkg/search/proxy/store/util.go	63.63%	8 Missing ⚠️
pkg/search/proxy/store/multi_cluster_cache.go	83.78%	3 Missing and 3 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7074      +/-   ##
==========================================
+ Coverage   46.55%   46.59%   +0.03%     
==========================================
  Files         700      700              
  Lines       48091    48149      +58     
==========================================
+ Hits        22389    22433      +44     
- Misses      24020    24030      +10     
- Partials     1682     1686       +4

Flag	Coverage Δ
unittests	`46.59% <76.27%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

RainbowMango

/assign

NickYadance · 2026-01-05T07:53:21Z

Hi @NickYadance, can you help take a review?

Looks good to me, thx @XiShanYongYe-Chang

XiShanYongYe-Chang · 2026-01-05T07:55:05Z

Thanks @NickYadance

RainbowMango · 2026-01-06T08:15:04Z

pkg/search/proxy/store/multi_cluster_cache.go

+	// activeWatchers tracks all active watch connections for each GVR
+	// key: GVR string representation, value: list of active watch multiplexers
+	activeWatchersLock sync.RWMutex
+	activeWatchers     map[string][]*invalidatableWatchMux


The invalidatableWatchMux naming is confusing, the structure itself doesn't indicate it's used for holding an invalidatable watch mux.

In addition, why not take schema.GroupVersionResource as the map key? That would make the code more readable.

Updated with watchMuxWithInvalidation, wdyt.

In addition, why not take schema.GroupVersionResource as the map key? That would make the code more readable.

Good suggestion.

RainbowMango · 2026-01-06T08:18:48Z

pkg/search/proxy/store/multi_cluster_cache.go


 	// add/update cluster cache
+	clustersAdded := false
+	addedClusters := []string{}


Suggested change

addedClusters := []string{}

Doesn't have to introduce a variable just for logging, and we already have the cluster name logged once a new cache is added.

RainbowMango · 2026-01-06T08:20:47Z

pkg/search/proxy/store/multi_cluster_cache.go

+			// Any cluster being added to cache (whether new or recovered) should trigger invalidation
+			// This is critical for cluster recovery scenarios where existing watch connections
+			// don't include the recovered cluster's resources


As far as I was told, the existing watch connection will eventually(~5 min) receive the recovered cluster's resources, so this comment might not be entirely accurate.

RainbowMango · 2026-01-06T08:27:35Z

pkg/search/proxy/store/multi_cluster_cache.go

+	// Cluster removal is already handled by cacher.Stop() -> terminateAllWatchers()
+	if clustersAdded {
+		klog.Infof("Cluster topology changed (clusters added: %v), invalidating all active watches to trigger reconnection", addedClusters)
+		c.invalidateAllWatches()


What bothers me is that I don't know the impact of interrupting a client's connection. How significant is it for the client?
For instance, if a client has already received some data from a healthy cluster, will it receive only the subsequent updates after re-establishing the watch, or will it get the full dataset again?

This should depend on the ResourceVersion parameter used when making the watch request. For the Reflector implementation in client-go, the default behavior is to only receive incremental changes. https://deepwiki.com/search/hpascaletargetrefworkloadkinda_e78b9389-089b-4a74-83f7-7370dc9976cf

In addition, we actually haven't changed the behavior of watch reconnections. It is still handled the same way as it was before. The current behavior simply terminates the watch request earlier than the default timeout, rather than waiting for the server to do so.

karmada-bot · 2026-01-06T08:40:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/search/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: changzhen <[email protected]>

karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 4, 2026

karmada-bot requested review from ikaven1024 and yanfeng1992 January 4, 2026 09:33

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 4, 2026

gemini-code-assist bot reviewed Jan 4, 2026

View reviewed changes

pkg/search/proxy/store/multi_cluster_cache.go Show resolved Hide resolved

pkg/search/proxy/store/util.go Outdated Show resolved Hide resolved

XiShanYongYe-Chang force-pushed the fix-6963 branch from e55b90b to 8c49192 Compare January 4, 2026 09:44

RainbowMango reviewed Jan 5, 2026

View reviewed changes

karmada-bot assigned RainbowMango Jan 5, 2026

RainbowMango added this to the v1.17 milestone Jan 6, 2026

RainbowMango requested changes Jan 6, 2026

View reviewed changes

stop search watch connection when failed cluster recovered

54b97ff

Signed-off-by: changzhen <[email protected]>

XiShanYongYe-Chang force-pushed the fix-6963 branch from 8c49192 to 54b97ff Compare January 6, 2026 12:19

XiShanYongYe-Chang requested a review from RainbowMango January 6, 2026 12:30

Fixed the issue that watch connect cannot reflect resource from recovered clusters immediately #7074

Are you sure you want to change the base?

Fixed the issue that watch connect cannot reflect resource from recovered clusters immediately #7074

Uh oh!

Conversation

XiShanYongYe-Chang commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

XiShanYongYe-Chang commented Jan 4, 2026

Uh oh!

codecov-commenter commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

RainbowMango left a comment

Choose a reason for hiding this comment

Uh oh!

NickYadance commented Jan 5, 2026

Uh oh!

XiShanYongYe-Chang commented Jan 5, 2026

Uh oh!

RainbowMango Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

RainbowMango Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

XiShanYongYe-Chang Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

RainbowMango Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

RainbowMango Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

RainbowMango Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

XiShanYongYe-Chang Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

karmada-bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

XiShanYongYe-Chang commented Jan 4, 2026 •

edited

Loading

codecov-commenter commented Jan 4, 2026 •

edited

Loading