Skip to content

SUNSUBSCRIBE corner case causes client out-of-sync #1066

Open
@zuiderkwast

Description

@zuiderkwast

When a client is subscribed to a sharded-pubsub channel in cluster mode and the slot is moved to another shard, or deleted, the client receives a spontaneous sunsubscribe push message.

If a client has just sent a SUNSUBSCRIBE command, the client cannot know if the sunsubscribe message is a response to the command or a sponaneous message.

In the following scenario, client 1 believes that the [sunsubscribe, ch, 0] push message is received as a response to SUNSUBSCRIBE. The CLUSTERDOWN error reply remains to be read and it appears to be out of sync, i.e. to the client it doesn't appear to match a command it has sent. (If the client has sent another command in the pipeline, the CLUSTERDOWN appears to be its reply.)

Client 1                Client 2            Primary
  |                          |                 |
  |                          |  DELSLOT        |
  |                          |---------------->|
  | SUNSUBSCRIBE ch          |                 |
  |------------------------------------------->|
  |                          |      OK         |
  | [sunsubscribe, ch, 0]    |<----------------|
  |<-------------------------------------------|
  |                          |                 |
  | -CLUSTERDOWN             |                 |
  |<-------------------------------------------|
  |                          |                 |

Originally posted by @zuiderkwast in #759 (comment) (but edited)

Test case

This test case passes, i.e. it illustrates what the clients actually see.

diff --git a/tests/unit/cluster/pubsubshard-slot-migration.tcl b/tests/unit/cluster/pubsubshard-slot-migration.tcl
index c5a324f09..26d6afe56 100644
--- a/tests/unit/cluster/pubsubshard-slot-migration.tcl
+++ b/tests/unit/cluster/pubsubshard-slot-migration.tcl
@@ -187,6 +187,32 @@ test "Delete a slot, verify sunsubscribe message" {
     $subscribeclient close
 }
 
+test "Slot deleted and unsubscribed simulaneously" {
+    set channelname ch5
+    set slot [$cluster cluster keyslot $channelname]
+
+    array set primary_client [$cluster masternode_for_slot $slot]
+
+    set subscribeclient [valkey_deferring_client_by_addr $primary_client(host) $primary_client(port)]
+    $subscribeclient HELLO 3
+    $subscribeclient read
+    $subscribeclient SSUBSCRIBE $channelname
+    $subscribeclient read
+
+    # Delete a slot.
+    assert_equal OK [$primary_client(link) CLUSTER DELSLOTS $slot]
+
+    # Send in a pipeline SUNSUBSCRIBE and DBSIZE
+    $subscribeclient SUNSUBSCRIBE $channelname
+    $subscribeclient DBSIZE
+
+    # We expect one reply per command, plus an implicit sunsubscribed message.
+    assert_equal "sunsubscribe $channelname 0" [$subscribeclient read]
+    catch {$subscribeclient read} e
+    assert_equal "CLUSTERDOWN Hash slot not served" $e
+    assert_equal 0 [$subscribeclient read]
+}
+
 test "Reset cluster, verify sunsubscribe message" {
     set channelname ch4
     set slot [$cluster cluster keyslot $channelname]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions