Race condition: disk writes/deletes during second `full_sync` can cause duplication or SEGFAULT on follower

**Describe the bug**
Disk mutations (message publishing, queue purge/delete) are not atomic with the replication actions sent to followers. During the second `full_sync` (which holds `@lock`), another thread can write to or delete a file on disk while the corresponding `append`/`delete_file` call blocks waiting for `@lock`. This can cause:

- **Message duplication** — a message written to disk is included in the synced file and then also sent as an `append` action after the follower is marked synced.
- **SEGFAULT** — a queue purge/delete closes and unmaps an `MFile` that `full_sync` is reading via `file.to_slice`.


## Timeline (duplication)

1. **Thread A** (follower sync) — starts second `full_sync`, acquires `@lock`
2. **Thread A** — `files_with_hash` computes hash for `file_1` (current state)
3. **Thread B** (publisher) — writes `msg_x` to `file_1` on disk
4. **Thread B** — calls `replicator.append(file_1, msg_x)` → `each_follower` → blocks waiting for `@lock`
5. **Thread A** — follower requests `file_1`, receives it (now including `msg_x`)
6. **Thread A** — marks follower as synced, releases `@lock`
7. **Thread B** — acquires `@lock`, follower is now synced, sends `append(file_1, msg_x)`
8. **Result** — `file_1` on the follower contains `msg_x` twice


## Root cause

The disk write (step 3) and the replication action (step 4) are not atomic with respect to the second `full_sync` holding `@lock`. The message is written to disk before `each_follower` is called, so the second `full_sync` can observe the new data in the file and transfer it to the follower. When `@lock` is released, the pending `append` action goes through to the now-synced follower, duplicating the data.


## Notes

- The same class of issue exists for `delete_file` — a file could be deleted from disk (e.g. queue purge/delete) before `each_follower` runs. The second `full_sync` may then try to read an `MFile` that has been closed/unmapped, causing a SEGFAULT. Or the follower receives a delete for a file it never got.
- This affects 2.7.0 where follower sync runs on a parallel execution context (`@mt`), making the race window more likely. In 2.6.x (single-threaded fibers), the window is narrower but still theoretically possible at yield points within `full_sync`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition: disk writes/deletes during second `full_sync` can cause duplication or SEGFAULT on follower #1806

Timeline (duplication)

Root cause

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Race condition: disk writes/deletes during second full_sync can cause duplication or SEGFAULT on follower #1806

Description

Timeline (duplication)

Root cause

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Race condition: disk writes/deletes during second `full_sync` can cause duplication or SEGFAULT on follower #1806