Skip to content

[Feature] Add option to prevent replica from replicating a empty DB if the primary comes back empty #579

Open
@ag-TJNII

Description

@ag-TJNII

The Problem

When running replication in environments without persistent storage it's possible for the following to occur:

  • Node 1 is the primary
  • Node 2 is replicating node 1
  • Node 1 drops
  • Node 1 comes back without it's DB, in an empty state
  • Node 2 reconnects and performs a full resync
  • Node 2 is now empty, having propagated the data loss

To reproduce

# Start a network namespace
docker run -d --name=valkey-ns registry.k8s.io/pause

# Start server 1
docker run -d --name=valkey-1 --rm -ti --net=container:valkey-ns valkey/valkey:7-alpine

# Start server 2
docker run -d --name=valkey-2 --rm -ti --net=container:valkey-ns valkey/valkey:7-alpine valkey-server --port 6380 --replicaof 127.0.0.1 6379

# Prove we're online and replicating
docker exec -ti valkey-1 valkey-cli get foo
(nil)
docker exec -ti valkey-2 valkey-cli get foo
(nil)
docker exec -ti valkey-1 valkey-cli set foo bar
OK
docker exec -ti valkey-2 valkey-cli get foo
"bar"
docker exec -ti valkey-2 valkey-cli info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=96,lag=0
master_failover_state:no-failover
master_replid:0cadcd122bc5a647f36a0e1a27f75c4bbbd94068
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:96
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:96

# Completely remove and restart 1
docker rm -f valkey-1
docker run -d --name=valkey-1 --rm -ti --net=container:valkey-ns valkey/valkey:7-alpine

# Read test key out of 2
docker exec -ti valkey-2 valkey-cli info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=wait_bgsave,offset=0,lag=0
master_failover_state:no-failover
master_replid:ad8c13246bf60c259f189c7a9db2e23f86cdfe17
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:0

docker exec -ti valkey-2 valkey-cli get foo
(nil)

# Clean up
docker rm -f valkey-ns valkey-2 valkey-1
$ docker logs valkey-2
1:C 30 May 2024 22:21:26.898 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:C 30 May 2024 22:21:26.898 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
1:C 30 May 2024 22:21:26.898 * Valkey version=7.2.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 30 May 2024 22:21:26.898 * Configuration loaded
1:S 30 May 2024 22:21:26.899 * monotonic clock: POSIX clock_gettime
                .+^+.
            .+#########+.
        .+########+########+.           Valkey 7.2.5 (00000000/0) 64 bit
    .+########+'     '+########+.
 .########+'     .+.     '+########.    Running in standalone mode
 |####+'     .+#######+.     '+####|    Port: 6380
 |###|   .+###############+.   |###|    PID: 1
 |###|   |#####*'' ''*#####|   |###|
 |###|   |####'  .-.  '####|   |###|
 |###|   |###(  (@@@)  )###|   |###|          https://valkey.io
 |###|   |####.  '-'  .####|   |###|
 |###|   |#####*.   .*#####|   |###|
 |###|   '+#####|   |#####+'   |###|
 |####+.     +##|   |#+'     .+####|
 '#######+   |##|        .+########'
    '+###|   |##|    .+########+'
        '|   |####+########+'
             +#########+'
                '+v+'

1:S 30 May 2024 22:21:26.901 * Server initialized 
1:S 30 May 2024 22:21:26.901 * Ready to accept connections tcp
1:S 30 May 2024 22:21:26.901 * Connecting to MASTER 127.0.0.1:6379
1:S 30 May 2024 22:21:26.901 * MASTER <-> REPLICA sync started
1:S 30 May 2024 22:21:26.901 * Non blocking connect for SYNC fired the event.
1:S 30 May 2024 22:21:26.901 * Master replied to PING, replication can continue...
1:S 30 May 2024 22:21:26.901 * Partial resynchronization not possible (no cached master)
1:S 30 May 2024 22:21:31.410 * Full resync from master: efef6f51100556f7d72c4bfa1c5e47afac364281:14
1:S 30 May 2024 22:21:31.411 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1:S 30 May 2024 22:21:31.411 * MASTER <-> REPLICA sync: Flushing old data
1:S 30 May 2024 22:21:31.411 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 30 May 2024 22:21:31.413 * Loading RDB produced by valkey version 7.2.5
1:S 30 May 2024 22:21:31.413 * RDB age 0 seconds  
1:S 30 May 2024 22:21:31.413 * RDB memory usage when created 0.92 Mb
1:S 30 May 2024 22:21:31.413 * Done loading RDB, keys loaded: 0, keys expired: 0.
1:S 30 May 2024 22:21:31.413 * MASTER <-> REPLICA sync: Finished with success
1:S 30 May 2024 22:21:54.104 * Connection with master lost.
1:S 30 May 2024 22:21:54.104 * Caching the disconnected master state.
1:S 30 May 2024 22:21:54.104 * Reconnecting to MASTER 127.0.0.1:6379
1:S 30 May 2024 22:21:54.104 * MASTER <-> REPLICA sync started
1:S 30 May 2024 22:21:54.104 # Error condition on socket for SYNC: Connection refused
1:S 30 May 2024 22:21:55.067 * Connecting to MASTER 127.0.0.1:6379
1:S 30 May 2024 22:21:55.067 * MASTER <-> REPLICA sync started
1:S 30 May 2024 22:21:55.067 # Error condition on socket for SYNC: Connection refused
1:S 30 May 2024 22:21:56.075 * Connecting to MASTER 127.0.0.1:6379
1:S 30 May 2024 22:21:56.075 * MASTER <-> REPLICA sync started
1:S 30 May 2024 22:21:56.075 # Error condition on socket for SYNC: Connection refused
1:S 30 May 2024 22:21:57.082 * Connecting to MASTER 127.0.0.1:6379
1:S 30 May 2024 22:21:57.083 * MASTER <-> REPLICA sync started
1:S 30 May 2024 22:21:57.083 # Error condition on socket for SYNC: Connection refused
1:S 30 May 2024 22:21:58.089 * Connecting to MASTER 127.0.0.1:6379
1:S 30 May 2024 22:21:58.089 * MASTER <-> REPLICA sync started
1:S 30 May 2024 22:21:58.089 # Error condition on socket for SYNC: Connection refused
1:S 30 May 2024 22:21:59.097 * Connecting to MASTER 127.0.0.1:6379
1:S 30 May 2024 22:21:59.097 * MASTER <-> REPLICA sync started
1:S 30 May 2024 22:21:59.097 # Error condition on socket for SYNC: Connection refused
1:S 30 May 2024 22:22:00.101 * Connecting to MASTER 127.0.0.1:6379
1:S 30 May 2024 22:22:00.102 * MASTER <-> REPLICA sync started
1:S 30 May 2024 22:22:00.102 * Non blocking connect for SYNC fired the event.
1:S 30 May 2024 22:22:00.102 * Master replied to PING, replication can continue...
1:S 30 May 2024 22:22:00.103 * Trying a partial resynchronization (request efef6f51100556f7d72c4bfa1c5e47afac364281:97).
1:S 30 May 2024 22:22:05.486 * Full resync from master: 82ab8fdec8d14314831607c8f0d2d0d13d457c91:0
1:S 30 May 2024 22:22:05.487 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1:S 30 May 2024 22:22:05.487 * Discarding previously cached master state.
1:S 30 May 2024 22:22:05.487 * MASTER <-> REPLICA sync: Flushing old data
1:S 30 May 2024 22:22:05.487 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 30 May 2024 22:22:05.490 * Loading RDB produced by valkey version 7.2.5
1:S 30 May 2024 22:22:05.490 * RDB age 0 seconds
1:S 30 May 2024 22:22:05.490 * RDB memory usage when created 0.92 Mb
1:S 30 May 2024 22:22:05.490 * Done loading RDB, keys loaded: 0, keys expired: 0.
1:S 30 May 2024 22:22:05.490 * MASTER <-> REPLICA sync: Finished with success

Expected behavior

The replica should have detected that:

  • The primary that came back was not the primary it was reading from
  • That the primary it was now reading from had no data

Additional information

This is documented at https://valkey.io/topics/replication/ under Safety of replication when master has persistence turned off, which is why I'm filing this as a feature, not a bug. I'm assuming it works this way for gestures vaguely reasons, but it seems like adding an option to disable replication if the primary's ID changes or if their offset goes backwards would be trivial and that the current behavior is dangerous.

Aside, that page says "For example the master can restart fast enough for Sentinel to not detect a failure, so that the failure mode described above happens." That's a nice way of saying "this implementation has a glaring race condition and flaw"... If the reliability of a replication process is reliant on instances not rebooting quickly that's a pretty huge sign something is wrong, but I digress...

The only workaround I'm aware of for this is to enable ACLs for replication, and configure the replication user to start disabled. This requires another process to assert that the primary is actually the primary and enable the user.

Please either break this behavior and make it fail safe, or add an option to fail safe. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Idea

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions