fix(cluster): Cluster reconnect sharded subscribers v4 (#2060)#2065
Open
PavelPashov wants to merge 1 commit intoredis:v4from
Open
fix(cluster): Cluster reconnect sharded subscribers v4 (#2060)#2065PavelPashov wants to merge 1 commit intoredis:v4from
PavelPashov wants to merge 1 commit intoredis:v4from
Conversation
* fix: trigger reconnect when sharded subscriber slots refresh fails When all sharded subscriber connections fail and the subsequent slots cache refresh returns ClusterAllFailedError, the cluster now properly enters reconnecting state instead of becoming zombied. This occurs when the cluster topology changes and all nodes are replaced with new IPs - the subscriber connections fail, triggering a slots refresh via the "-node" event handler. If this refresh fails (e.g., the duplicated connection for CLUSTER SLOTS times out or closes), the cluster becomes stuck in "ready" state with no working connections because normal pool connections use lazyConnect: true and never emit "end" events to trigger the drain->close->reconnect cycle. Now subscriber-triggered refreshSlotsCache() calls use a dedicated callback that detects ClusterAllFailedError and calls disconnect(true) to force reconnection, preventing the zombie state. * test: ensure reconnect after sharded subscriber failure
❌ Security scan failedSecurity scan failed: Branch fix/cluster-reconnect-on-refresh-failure-v4 does not exist in the remote repository 💡 Need to bypass this check? Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When all sharded subscriber connections fail and the subsequent slots cache refresh returns ClusterAllFailedError, the cluster now properly enters reconnecting state instead of becoming zombied. This occurs when the cluster topology changes and all nodes are replaced with new IPs - the subscriber connections fail, triggering a slots refresh via the "-node" event handler. If this refresh fails (e.g., the duplicated connection for CLUSTER SLOTS times out or closes), the cluster becomes stuck in "ready" state with no working connections because normal pool connections use lazyConnect: true and never emit "end" events to trigger the drain->close->reconnect cycle. Now subscriber-triggered refreshSlotsCache() calls use a dedicated callback that detects ClusterAllFailedError and calls disconnect(true) to force reconnection, preventing the zombie state.