How to deal with Raft network partition handling #16472
Replies: 3 comments 1 reply
-
|
If a client isn't able to make progress on one node they can try another node and so on. In some cases it may bot be possible to make progress at all (e.g. complicated topic / fan-out routing that routes to a mixture of quorum and classic queues). What quorum queues guarantees is that if you get a publisher confirm / accepted settlement the message will not be lost. |
Beta Was this translation helpful? Give feedback.
-
|
I did notice that my python test scripts indeed reconnect eventually, after some time hanging on the wrong node. So have I got it right that when using proper clients and QQs they will eventually time out when connected to a "dead" queue and then reconnect to another node and continue working, and that it just looks a whole lot worse than it actually is when seen from the management interface? That leaves me with the problem of slightly less than perfect clients. ;-) I'll do some more testing... |
Beta Was this translation helpful? Give feedback.
-
I've got all my publishers on delivery mode persistent now, and I do seem to sometimes lose some data during a netsplit. Multiple publishers are writing to one exchange which is feeding 3 queues that should be identical, and I'm seeing differences in the data from those three queues even after the cluster has recovered. It's going to be tricky to nail this down to something reproducible. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Community Support Policy
RabbitMQ version used
other (please specify)
Erlang version used
27.3.x
Operating system (distribution) used
Ubuntu 24.04 LTS
How is RabbitMQ deployed?
Debian package
rabbitmq-diagnostics status output
Logs from node 1 (with sensitive values edited out)
Not sure what logs to provide here...
Logs from node 2 (if applicable, with sensitive values edited out)
No response
Logs from node 3 (if applicable, with sensitive values edited out)
No response
rabbitmq.conf
Steps to deploy RabbitMQ cluster
Short version:
(Not sure what level of detail you want here)
Steps to reproduce the behavior in question
I've been testing (partial) netsplit handling. So, the short version is
iptables -I INPUT -s $ip -j DROP.advanced.config
No response
Application code
No response
Kubernetes deployment file
No response
What problem are you trying to solve?
First of all, some more context:
Our production RabbitMQ cluster is still on version 4.1.x, I'm working on upgrading it.
Our test cluster is currently on 4.3.0.
I've been testing (partial) netsplit handling because our production cluster shut down completely a while back because of a partial netsplit, ie two nodes couldn't see each other but both could see the third node. There is a race condition where each node which can't see another node initiates a shutdown, leaving just node 3 on its own which then shuts down as well. This didn't recover automatically, but restarting either of the first two nodes would bring the cluster back online. (Or fixing the netsplit, ofcourse.)
Now, with 4.2/4.3 we get Raft network partition handling. If I trigger a full netsplit it all seems to work fine, QQs get moved to appropriate nodes and the isolated node is all red. However, partial network partitions is where things get a bit messier.
As far as I can see, the Raft handling is applied per individual queue, so depending on which node a queue lives and from which node you connect, a queue is either completely fine with 3 replicas, degraded but functional (2 replicas), or dead.
Example: I drop the link between nodes 1 and 2. My queue lives on node 1. Initially:
After a while this seems to partially settle. For a while I saw a queue active on node 2 with 3 as follower when seen from node 2, while also still active on node 1 when seen from node 1 and 3. This did settle down after some time and queues ended up on either node 1 or node 3. The ones on node 3 are fully working with 3 replicas, the ones on node 1 are dead when seen from node 2.
Looking at the overview in the management console, node 1 considers node 2 dead, node 2 considers node 1 dead, and node 3 considers all nodes up.
Once the network comes back up (or I kill the firewall rule in testing) it all seems to recover pretty much instantly, queues get shuffled around again, replicas catch up and all nodes agree on the state of things.
But during the network partition, there are multiple realities depending on which node you happen to connect to, which queue you're reading from, or indeed which node you're connecting to to publish things. All nodes remain up and accepting connections, or not, depending on their view of the cluster and which specific queue you're reading/writing to.
In the old cluster (4.1 and earlier), one node would shut itself down, connections would go to the remaining two nodes which agreed on the state of the cluster and everything would keep working, data would keep flowing. (Apart from that race condition that is.)
I'm not really sure how to nail this down further as there are a lot of factors in play on a cluster with multiple queues...
I've been told this "works as designed", so now my big question is, how do I safely communicate with a cluster in this state given that there isn't one "obvious" down node and any node can be safe for one queue while broken for another queue?
Beta Was this translation helpful? Give feedback.
All reactions