JetStream loses acknowledged writes by default due to deferred fsync

### Observed behavior

The [Jetstream Developer docs](https://docs.nats.io/using-nats/developer/develop_jetstream?q=sync#publish-to-a-stream) say:

> If the publisher receives the acknowledgement from the server it can safely discard any state it has for that publication, the message has not only been received correctly by the server, but it has also been successfully persisted.

The [top-level JetStream docs](https://docs.nats.io/nats-concepts/jetstream#persistent-and-consistent-distributed-storage) say that "the formal consistency model of NATS JetStream is [Linearizable](https://jepsen.io/consistency/models/linearizable)", and the system as a whole is Serializable, "because messages are added to a stream in one global order".

There is an important caveat here. Unlike, say, [MongoDB](https://www.mongodb.com/docs/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.writeConcernMajorityJournalDefault) or [Redpanda](https://www.redpanda.com/blog/why-fsync-is-needed-for-data-safety-in-kafka-or-non-byzantine-protocols), which flush data to disk on at least a majority of nodes before acknowledging a write, NATS Jetstream only syncs to disk once every two minutes by default. However, nodes acknowledge writes immediately. It will happily lose data, or an entire stream, when nodes crash around the same time. This means that JetStream clusters will very likely violate claimed safety guarantees when (e.g.) a rack or datacenter power system fails.

For example, [this test run](https://github.com/user-attachments/files/23630617/20251119T074535.861-0600.zip) caused the loss of all records written to the single JetStream stream. Every five seconds, nodes would log:

```
[1886557] 2025/11/19 13:51:45.506132 [WRN] Error applying entries to 'jepsen > jepsen-stream': catchup aborted, no leader for stream 'jepsen > jepsen-stream'
[1886557] 2025/11/19 13:51:45.506183 [WRN] RAFT [fjFyEjc1 - S-R5F-41G3ODY1] Draining and replaying snapshot
```

Every call to `fetch` during the five-minute recovery period returned an OK, but empty response.

In [this run](https://github.com/user-attachments/files/23631053/20251119T075152.133-0600.zip), the stream files existed on every server, but every file was zero bytes. Every attempt to subscribe to the stream failed for the five minute recovery period, logging `[SUB-90007] No matching streams for subject`.

If node failures happen a few minutes after a stream is created, JetStream reliably loses a large window of records. For example, [a simulated power failure during this run](https://s3.amazonaws.com/jepsen.io/analyses/nats-2.12.1/20251119T081248-later-power-failure-data-loss.zip) caused JetStream to lose 131,418 out of 930,005 acknowledged writes.

<img width="900" height="400" alt="Image" src="https://github.com/user-attachments/assets/58342081-8881-4954-8ee4-bffa20886624" />

Node failures do not have to happen simultaneously for JetStream to lose data. For example, a rapid series of node crashes, approximately one every two seconds, was sufficient to cause total data loss in [this test run](https://github.com/user-attachments/files/23631363/20251119T085347.396-0600.zip). In this case, JetStream denied the entire existence of the stream when the cluster recovered, logging `No matching streams for subject`. 

<img width="900" height="400" alt="Image" src="https://github.com/user-attachments/assets/6d4060e1-708a-4249-8ab7-3ead8ace5c67" />

The only mention of this fact I can find in any of NATS' documentation is buried in [the JetStream configuration options list](https://docs.nats.io/running-a-nats-service/configuration#jetstream-server-settings). I suggest that NATS either a.) change the default settings, or b.) prominently document that JetStream cannot tolerate some kinds of node crashes.

### Expected behavior

NATS, like other distributed databases which employ a consensus algorithm, should not lose acknowledged operations when nodes crash.

### Server and client version

NATS 2.12.1, and jnats 2.24.0

### Host environment

These tests are running in LXC containers.

### Steps to reproduce

This is in the Jepsen test suite, but give me a minute before I put in repro instructions--I've got one more scenario I want to build out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

JetStream loses acknowledged writes by default due to deferred fsync #7564

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

JetStream loses acknowledged writes by default due to deferred fsync #7564

Description

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions