-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Observed behavior
The Jetstream Developer docs say:
If the publisher receives the acknowledgement from the server it can safely discard any state it has for that publication, the message has not only been received correctly by the server, but it has also been successfully persisted.
The top-level JetStream docs say that "the formal consistency model of NATS JetStream is Linearizable", and the system as a whole is Serializable, "because messages are added to a stream in one global order".
There is an important caveat here. Unlike, say, MongoDB or Redpanda, which flush data to disk on at least a majority of nodes before acknowledging a write, NATS Jetstream only syncs to disk once every two minutes by default. However, nodes acknowledge writes immediately. It will happily lose data, or an entire stream, when nodes crash around the same time. This means that JetStream clusters will very likely violate claimed safety guarantees when (e.g.) a rack or datacenter power system fails.
For example, this test run caused the loss of all records written to the single JetStream stream. Every five seconds, nodes would log:
[1886557] 2025/11/19 13:51:45.506132 [WRN] Error applying entries to 'jepsen > jepsen-stream': catchup aborted, no leader for stream 'jepsen > jepsen-stream'
[1886557] 2025/11/19 13:51:45.506183 [WRN] RAFT [fjFyEjc1 - S-R5F-41G3ODY1] Draining and replaying snapshot
Every call to fetch during the five-minute recovery period returned an OK, but empty response.
In this run, the stream files existed on every server, but every file was zero bytes. Every attempt to subscribe to the stream failed for the five minute recovery period, logging [SUB-90007] No matching streams for subject.
If node failures happen a few minutes after a stream is created, JetStream reliably loses a large window of records. For example, a simulated power failure during this run caused JetStream to lose 131,418 out of 930,005 acknowledged writes.
Node failures do not have to happen simultaneously for JetStream to lose data. For example, a rapid series of node crashes, approximately one every two seconds, was sufficient to cause total data loss in this test run. In this case, JetStream denied the entire existence of the stream when the cluster recovered, logging No matching streams for subject.
The only mention of this fact I can find in any of NATS' documentation is buried in the JetStream configuration options list. I suggest that NATS either a.) change the default settings, or b.) prominently document that JetStream cannot tolerate some kinds of node crashes.
Expected behavior
NATS, like other distributed databases which employ a consensus algorithm, should not lose acknowledged operations when nodes crash.
Server and client version
NATS 2.12.1, and jnats 2.24.0
Host environment
These tests are running in LXC containers.
Steps to reproduce
This is in the Jepsen test suite, but give me a minute before I put in repro instructions--I've got one more scenario I want to build out.