Skip to content

[NEW] Optimizing Cluster Topology Fetching with Hash Validation #1814

@stavbsamzn

Description

@stavbsamzn

The problem/use-case that the feature addresses
In large-scale deployments with numerous clients, repeatedly fetching the entire cluster topology can become costly, particularly when many clients perform this operation simultaneously. This results in network traffic, slower response times, and increased load on the cluster. There is no efficient way for clients to confirm if their slot-to-node mapping is still accurate, leading to clients fetching the full topology unnecessarily for minor or non-existent changes.

Description of the feature
This feature involves each node in the cluster maintaining a hash that represents the entire cluster topology. Clients can request this hash using a new command (e.g TOPOLOGY HASH) and compare it to their locally cached version. If the hashes match, no further action is needed. However, if the hashes differ, the client will request the full cluster topology via CLUSTER SLOTS or CLUSTER SHARDS.

Alternatives I've considered
Event-based Notification – This solution involves notifying clients of topology changes in real-time using RESP3 push messages or a dedicated Pub/Sub channel. When a relevant change (such as a node failure or slot migration) occurs, the cluster broadcasts an update. This approach can serve as an orthogonal solution to the hashing mechanism, providing real-time notifications of topology changes while the hash-based system focuses on validating the client’s view.
There are ongoing discussions regarding this solution, as seen in the following:

Additional information
While a new command for fetching just the hash is useful, we can further reduce complexity and save a round trip by implementing this into CLUSTER SLOTS and CLUSTER SHARDS. The commands will accept the client's current hash as input (i.e., CLUSTER SHARDS <client’s hash>). The commands would return the regular output with the new hash only if the engine’s hash differs from the client's. If the hashes match, it would return Nil, minimizing the need for redundant queries.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Idea

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions