Skip to content

JetStream loses acknowledged writes by default due to deferred fsync #7564

@aphyr

Description

@aphyr

Observed behavior

The Jetstream Developer docs say:

If the publisher receives the acknowledgement from the server it can safely discard any state it has for that publication, the message has not only been received correctly by the server, but it has also been successfully persisted.

The top-level JetStream docs say that "the formal consistency model of NATS JetStream is Linearizable", and the system as a whole is Serializable, "because messages are added to a stream in one global order".

There is an important caveat here. Unlike, say, MongoDB or Redpanda, which flush data to disk on at least a majority of nodes before acknowledging a write, NATS Jetstream only syncs to disk once every two minutes by default. However, nodes acknowledge writes immediately. It will happily lose data, or an entire stream, when nodes crash around the same time. This means that JetStream clusters will very likely violate claimed safety guarantees when (e.g.) a rack or datacenter power system fails.

For example, this test run caused the loss of all records written to the single JetStream stream. Every five seconds, nodes would log:

[1886557] 2025/11/19 13:51:45.506132 [WRN] Error applying entries to 'jepsen > jepsen-stream': catchup aborted, no leader for stream 'jepsen > jepsen-stream'
[1886557] 2025/11/19 13:51:45.506183 [WRN] RAFT [fjFyEjc1 - S-R5F-41G3ODY1] Draining and replaying snapshot

Every call to fetch during the five-minute recovery period returned an OK, but empty response.

In this run, the stream files existed on every server, but every file was zero bytes. Every attempt to subscribe to the stream failed for the five minute recovery period, logging [SUB-90007] No matching streams for subject.

If node failures happen a few minutes after a stream is created, JetStream reliably loses a large window of records. For example, a simulated power failure during this run caused JetStream to lose 131,418 out of 930,005 acknowledged writes.

Image

Node failures do not have to happen simultaneously for JetStream to lose data. For example, a rapid series of node crashes, approximately one every two seconds, was sufficient to cause total data loss in this test run. In this case, JetStream denied the entire existence of the stream when the cluster recovered, logging No matching streams for subject.

Image

The only mention of this fact I can find in any of NATS' documentation is buried in the JetStream configuration options list. I suggest that NATS either a.) change the default settings, or b.) prominently document that JetStream cannot tolerate some kinds of node crashes.

Expected behavior

NATS, like other distributed databases which employ a consensus algorithm, should not lose acknowledged operations when nodes crash.

Server and client version

NATS 2.12.1, and jnats 2.24.0

Host environment

These tests are running in LXC containers.

Steps to reproduce

This is in the Jepsen test suite, but give me a minute before I put in repro instructions--I've got one more scenario I want to build out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectSuspected defect such as a bug or regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions