How to deal with Raft network partition handling #16472

tubemeister · 2026-05-19T10:46:54Z

tubemeister
May 19, 2026

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

other (please specify)

Erlang version used

27.3.x

Operating system (distribution) used

Ubuntu 24.04 LTS

How is RabbitMQ deployed?

Debian package

rabbitmq-diagnostics status output

Status of node rabbit@rabbit-1 ...
Runtime

OS PID: 95209
OS: Linux
Uptime (seconds): 669520
Is under maintenance?: false
RabbitMQ version: 4.3.0
RabbitMQ release series support status: see https://www.rabbitmq.com/release-information
Node name: rabbit@rabbit-1
Erlang configuration: Erlang/OTP 27 [erts-15.2.7.8] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit:ns]
Crypto library: OpenSSL 3.0.13 30 Jan 2024
Erlang processes: 655 used, 1048576 limit
Scheduler run queue: 0
Cluster heartbeat timeout (net_ticktime): 30

Plugins

Enabled plugin file: /etc/rabbitmq/enabled_plugins
Enabled plugins:

 * rabbitmq_prometheus
 * rabbitmq_mqtt
 * prometheus
 * ddskerl
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_web_dispatch
 * amqp_client
 * cowboy
 * oauth2_client
 * jose

Data directory

Node data directory: /var/lib/rabbitmq/mnesia/rabbit@rabbit-1
Raft data directory: /var/lib/rabbitmq/mnesia/rabbit@rabbit-1/quorum/rabbit@rabbit-1

Config files

 * /etc/rabbitmq/rabbitmq.conf

Log file(s)

 * /var/log/rabbitmq/channel.log
 * /var/log/rabbitmq/connection.log
 * /var/log/rabbitmq/federation.log
 * /var/log/rabbitmq/mirroring.log
 * /var/log/rabbitmq/queue.log
 * /var/log/rabbitmq/rabbit@rabbit-1.log
 * <stdout>

Alarms

(none)

Tags

(none)

Memory

Total memory used: 0.1842 gb
Calculation strategy: rss
Memory high watermark setting: 0.4 of available memory, computed to: 3.3308 gb

reserved_unallocated: 0.0614 gb (33.35 %)
code: 0.0254 gb (13.8 %)
allocated_unused: 0.0232 gb (12.57 %)
other_system: 0.0214 gb (11.63 %)
other_proc: 0.0193 gb (10.45 %)
binary: 0.0111 gb (6.04 %)
plugins: 0.0065 gb (3.52 %)
other_ets: 0.0061 gb (3.3 %)
mgmt_db: 0.0031 gb (1.7 %)
atom: 0.0013 gb (0.69 %)
quorum_queue_procs: 0.0012 gb (0.65 %)
metrics: 0.0009 gb (0.49 %)
connection_other: 0.0007 gb (0.4 %)
metadata_store: 0.0007 gb (0.37 %)
msg_index: 0.0006 gb (0.34 %)
connection_channels: 0.0005 gb (0.27 %)
quorum_ets: 0.0004 gb (0.2 %)
connection_readers: 0.0003 gb (0.15 %)
metadata_store_ets: 0.0001 gb (0.04 %)
connection_writers: 0.0 gb (0.02 %)
quorum_queue_dlx_procs: 0.0 gb (0.0 %)
stream_queue_procs: 0.0 gb (0.0 %)
stream_queue_replica_reader_procs: 0.0 gb (0.0 %)
mnesia: 0.0 gb (0.0 %)
stream_queue_coordinator_procs: 0.0 gb (0.0 %)
queue_procs: 0.0 gb (0.0 %)

Free Disk Space

Low free disk space watermark: 0.05 gb
Free disk space: 3.9157 gb

Totals

Connection count: 9
Queue count: 8
Virtual host count: 4

Listeners

Interface: [::], port: 15672, protocol: http, purpose: HTTP API
Interface: [::], port: 15671, protocol: https, purpose: HTTP API over TLS (HTTPS)
Interface: [::], port: 11883, protocol: mqtt, purpose: MQTT
Interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS

Logs from node 1 (with sensitive values edited out)

Not sure what logs to provide here...

Logs from node 2 (if applicable, with sensitive values edited out)

No response

Logs from node 3 (if applicable, with sensitive values edited out)

No response

rabbitmq.conf

# Ansible managed from playbooks/rabbitmq-cluster
#
# General
heartbeat = 15      # default 60
net_ticktime = 30   # default 60
default_queue_type = quorum # new in version 3.13.3

# Logging
log.file.level = debug
log.connection.file = /var/log/rabbitmq/connection.log
log.connection.level = debug
log.federation.file = /var/log/rabbitmq/federation.log
log.federation.level = debug
log.mirroring.file = /var/log/rabbitmq/mirroring.log
log.mirroring.level = debug
log.queue.file = /var/log/rabbitmq/queue.log
log.queue.level = debug
log.channel.file = /var/log/rabbitmq/channel.log
log.channel.level = debug

# Cluster config
cluster_partition_handling = pause_minority

cluster_formation.peer_discovery_backend = classic_config
cluster_formation.classic_config.nodes.1 = rabbit@rabbit-1
cluster_formation.classic_config.nodes.2 = rabbit@rabbit-2
cluster_formation.classic_config.nodes.3 = rabbit@rabbit-3

# SSL config
listeners.ssl.default = 5671

ssl_options.cacertfile = /etc/rabbitmq/ssl/wild.rabbitmq.dmz.$domain.chain
ssl_options.certfile   = /etc/rabbitmq/ssl/wild.rabbitmq.dmz.$domain.crt
ssl_options.keyfile    = /etc/rabbitmq/ssl/wild.rabbitmq.dmz.$domain.key
ssl_options.verify     = verify_peer
ssl_options.fail_if_no_peer_cert = false # for now

# Management interface
management.tcp.port       = 15672
management.ssl.port       = 15671
management.ssl.cacertfile = /etc/rabbitmq/ssl/wild.rabbitmq.dmz.$domain.chain
management.ssl.certfile   = /etc/rabbitmq/ssl/wild.rabbitmq.dmz.$domain.crt
management.ssl.keyfile    = /etc/rabbitmq/ssl/wild.rabbitmq.dmz.$domain.key

# MQTT
mqtt.allow_anonymous = false # default true
mqtt.exchange = mqtt
mqtt.listeners.tcp.default = 11883

Steps to deploy RabbitMQ cluster

Short version:

Basic 3 node cluster
Barebones Ubuntu install (currently 24.04)
RabbitMQ and Erlang from the deb*.rabbitmq.com repositories
Erlang cookie for clustering
Haproxy loadbalancing access across all nodes
Floating ip using corosync
All deployed from an Ansible playbook

(Not sure what level of detail you want here)

Steps to reproduce the behavior in question

I've been testing (partial) netsplit handling. So, the short version is iptables -I INPUT -s $ip -j DROP.

advanced.config

No response

Application code

No response

Kubernetes deployment file

No response

What problem are you trying to solve?

First of all, some more context:
Our production RabbitMQ cluster is still on version 4.1.x, I'm working on upgrading it.
Our test cluster is currently on 4.3.0.

I've been testing (partial) netsplit handling because our production cluster shut down completely a while back because of a partial netsplit, ie two nodes couldn't see each other but both could see the third node. There is a race condition where each node which can't see another node initiates a shutdown, leaving just node 3 on its own which then shuts down as well. This didn't recover automatically, but restarting either of the first two nodes would bring the cluster back online. (Or fixing the netsplit, ofcourse.)

Now, with 4.2/4.3 we get Raft network partition handling. If I trigger a full netsplit it all seems to work fine, QQs get moved to appropriate nodes and the isolated node is all red. However, partial network partitions is where things get a bit messier.

As far as I can see, the Raft handling is applied per individual queue, so depending on which node a queue lives and from which node you connect, a queue is either completely fine with 3 replicas, degraded but functional (2 replicas), or dead.

Example: I drop the link between nodes 1 and 2. My queue lives on node 1. Initially:

Seen from node 1, my queue is on node 1, degraded, but working
Seen from node 2, my queue is on node 1, dead
Seen from node 3, my queue is on node 1, degraded, but working

After a while this seems to partially settle. For a while I saw a queue active on node 2 with 3 as follower when seen from node 2, while also still active on node 1 when seen from node 1 and 3. This did settle down after some time and queues ended up on either node 1 or node 3. The ones on node 3 are fully working with 3 replicas, the ones on node 1 are dead when seen from node 2.

Looking at the overview in the management console, node 1 considers node 2 dead, node 2 considers node 1 dead, and node 3 considers all nodes up.

Once the network comes back up (or I kill the firewall rule in testing) it all seems to recover pretty much instantly, queues get shuffled around again, replicas catch up and all nodes agree on the state of things.

But during the network partition, there are multiple realities depending on which node you happen to connect to, which queue you're reading from, or indeed which node you're connecting to to publish things. All nodes remain up and accepting connections, or not, depending on their view of the cluster and which specific queue you're reading/writing to.

In the old cluster (4.1 and earlier), one node would shut itself down, connections would go to the remaining two nodes which agreed on the state of the cluster and everything would keep working, data would keep flowing. (Apart from that race condition that is.)

I'm not really sure how to nail this down further as there are a lot of factors in play on a cluster with multiple queues...

I've been told this "works as designed", so now my big question is, how do I safely communicate with a cluster in this state given that there isn't one "obvious" down node and any node can be safe for one queue while broken for another queue?

kjnilsson · 2026-05-19T11:07:04Z

kjnilsson
May 19, 2026
Maintainer

If a client isn't able to make progress on one node they can try another node and so on. In some cases it may bot be possible to make progress at all (e.g. complicated topic / fan-out routing that routes to a mixture of quorum and classic queues).

What quorum queues guarantees is that if you get a publisher confirm / accepted settlement the message will not be lost.

0 replies

tubemeister · 2026-05-19T14:42:08Z

tubemeister
May 19, 2026
Author

I did notice that my python test scripts indeed reconnect eventually, after some time hanging on the wrong node.

So have I got it right that when using proper clients and QQs they will eventually time out when connected to a "dead" queue and then reconnect to another node and continue working, and that it just looks a whole lot worse than it actually is when seen from the management interface?

That leaves me with the problem of slightly less than perfect clients. ;-)
Telegraf seems to hang forever and not reconnect to another node, which it does do when I actually stop the node.
Similarly, things that connect via the MQTT plugin reconnect well enough when I stop the node but in this netsplit case they keep flapping while occasionally getting some data through.

I'll do some more testing...

0 replies

tubemeister · 2026-05-20T12:13:14Z

tubemeister
May 20, 2026
Author

What quorum queues guarantees is that if you get a publisher confirm / accepted settlement the message will not be lost.

I've got all my publishers on delivery mode persistent now, and I do seem to sometimes lose some data during a netsplit. Multiple publishers are writing to one exchange which is feeding 3 queues that should be identical, and I'm seeing differences in the data from those three queues even after the cluster has recovered.

It's going to be tricky to nail this down to something reproducible.

1 reply

lukebakken May 20, 2026
Maintainer

Persistent messages AND publisher confirmations are required. Are you using publisher confirmations correctly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to deal with Raft network partition handling #16472

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to deal with Raft network partition handling #16472

Uh oh!

tubemeister May 19, 2026

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 3 comments · 1 reply

Uh oh!

kjnilsson May 19, 2026 Maintainer

Uh oh!

tubemeister May 19, 2026 Author

Uh oh!

tubemeister May 20, 2026 Author

Uh oh!

lukebakken May 20, 2026 Maintainer

tubemeister
May 19, 2026

Replies: 3 comments 1 reply

kjnilsson
May 19, 2026
Maintainer

tubemeister
May 19, 2026
Author

tubemeister
May 20, 2026
Author

lukebakken May 20, 2026
Maintainer