Skip to content

fix(threads): write messages.jsonl atomically to prevent corruption (#8019)#8071

Open
greatjourney589 wants to merge 1 commit into
janhq:mainfrom
greatjourney589:fix/threads-atomic-message-write
Open

fix(threads): write messages.jsonl atomically to prevent corruption (#8019)#8071
greatjourney589 wants to merge 1 commit into
janhq:mainfrom
greatjourney589:fix/threads-atomic-message-write

Conversation

@greatjourney589
Copy link
Copy Markdown

Summary

modify_message and delete_message rewrote a thread's messages.jsonl via write_messages_to_file, which opened the target with File::create - POSIX O_TRUNC | O_WRONLY. The file was truncated before the new contents were written, so if the process was interrupted mid-write (force-close, SIGKILL, power loss, OOM), messages.jsonl was left empty or partially written and the thread failed to load on the next launch.

Per-thread async locks already serialize writes, but locks don't help against process-level interruption.

Changes

  • src-tauri/src/core/threads/helpers.rs - write_messages_to_file now writes to a sibling messages.jsonl.tmp, flushes + best-effort sync_all, then atomically renames it over the target. On any failure the tmp file is cleaned up, so an interrupted write leaves the previous messages.jsonl fully intact.
    • fs::rename is atomic on POSIX (same filesystem) and maps to MoveFileExW with MOVEFILE_REPLACE_EXISTING on Windows, so the destination is never observably missing or partial.
  • src-tauri/src/core/threads/tests.rs - 3 new regression tests.

Call sites in commands.rs (modify_message, delete_message) are unchanged - the atomicity is entirely inside the helper.

Test plan

  • cargo test --no-default-features --features test-tauri --lib core::threads:: - all 35 thread tests pass, including the 3 new ones:
    • test_write_messages_leaves_no_tmp_file_on_success
    • test_write_messages_overwrites_stale_tmp_from_prior_crash
    • test_write_messages_preserves_existing_on_write_failure
  • Existing test_modify_and_delete_message, test_write_messages_overwrites_existing, test_concurrent_message_operations still pass (no regression on the happy path).

Before / after

Backend-only fix - no UI. Demonstrated via tests rather than screenshots.

Before: File::create(path) truncates the messages file, then writes message-by-message. A crash in between leaves a corrupt messages.jsonl and the new regression tests fail against main.

After: Writes land in messages.jsonl.tmp, then a single atomic rename replaces the target. A mid-write crash leaves the previous file intact; a successful write leaves no .tmp residue. A stale .tmp from a prior crashed run is harmlessly overwritten by the next write.

Related

Fixes #8019.

@tokamak-pm
Copy link
Copy Markdown

tokamak-pm Bot commented Apr 24, 2026

Review — PR #8071: fix(threads): write messages.jsonl atomically to prevent corruption

Summary

Replaces the direct File::create (which truncates immediately) in write_messages_to_file with an atomic write-then-rename pattern: data is written to a .tmp sibling, flushed, sync_all'd (best-effort), and then fs::rename'd over the target. This prevents data loss when the process is killed mid-write (issue #8019).

Key Findings

Correctness — Good

  • The write-tmp-then-rename pattern is the standard approach for crash-safe file updates on POSIX. fs::rename is atomic on the same filesystem, and since the .tmp file lives in the same directory this is guaranteed.
  • Cleanup of the .tmp file on write failure is correctly handled.
  • file.flush() before sync_all() is correct; making sync_all best-effort with let _ = is a reasonable pragmatic choice for filesystems that don't support it.

Tests — Thorough

  • Three new test cases cover: no stray .tmp on success, overwriting a stale .tmp from a prior crash, and preserving the original file when the write path fails. These directly test the regression scenario from bug: prevent thread message-store #8019.

Minor Observations

  • The .tmp filename (messages.jsonl.tmp) is deterministic. If two writers somehow bypassed the per-thread lock (e.g., a bug elsewhere), they'd collide on the same temp file. The existing Arc<Mutex<()>> makes this unlikely, but a random suffix (e.g., via tempfile::NamedTempFile) would be slightly more defensive. Not a blocker.
  • On Windows, fs::rename can fail if the target is open by another process. The existing per-thread lock should prevent this within the app, but external readers (antivirus, backup tools) could theoretically cause transient failures. Again, not a blocker — this is a known platform limitation.

Recommendation

can merge — Well-implemented fix with good test coverage for a real data-loss scenario.

@tokamak-pm
Copy link
Copy Markdown

tokamak-pm Bot commented Apr 26, 2026

Code Review

Summary: This PR fixes a data corruption risk in write_messages_to_file by switching from direct truncate-and-write (File::create) to an atomic write-tmp-then-rename pattern. If the process is killed mid-write, the previous messages.jsonl remains intact instead of being left empty or partially written.

Findings:

  • Correctness: The implementation is sound. It writes to a .tmp sibling file, flushes, attempts sync_all (best-effort), then uses fs::rename which is atomic on POSIX (same filesystem). On Windows, MoveFileExW with MOVEFILE_REPLACE_EXISTING provides equivalent behavior.
  • Error handling: Good -- on write failure, the .tmp file is cleaned up and the error propagated. On rename failure, the .tmp is also cleaned up. The original file is never touched until the rename succeeds.
  • Edge case: The sync_all failure is intentionally ignored (some tmpfs setups don't support fsync). This is acceptable as it is best-effort durability.
  • Tests: Three well-designed regression tests covering: (1) no stray .tmp after success, (2) stale .tmp from a prior crash gets overwritten, (3) original file preserved on write failure. Good coverage.
  • Minor note: If two concurrent callers (somehow bypassing the per-thread lock) wrote simultaneously, they would race on the same .tmp path. However, the existing async lock prevents this, so it is not a practical concern.
  • No issues found with the approach. This is a well-known safe pattern for crash-safe file updates.

Recommendation: Can merge. This is a solid fix for a real data corruption risk, with good test coverage.

@qnixsynapse
Copy link
Copy Markdown
Contributor

Have you tested this on Windows?

@greatjourney589
Copy link
Copy Markdown
Author

Have you tested this on Windows?

Yes, I did. Could you approve workflows and then merge this PR? @qnixsynapse

@tokamak-pm
Copy link
Copy Markdown

tokamak-pm Bot commented May 1, 2026

Review -- PR #8071: fix(threads): write messages.jsonl atomically to prevent corruption

Summary: Replaces direct File::create (truncate-then-write) in write_messages_to_file with an atomic write-to-tmp-then-rename pattern to prevent data loss when the process is killed mid-write. Fixes #8019.

Observations:

  • (+) The write-tmp-rename approach is the standard pattern for crash-safe file updates. fs::rename is atomic on POSIX when source and target are on the same filesystem, and the .tmp file is always in the same directory, so this is guaranteed.
  • (+) Error handling is thorough: the .tmp file is cleaned up on both write failure and rename failure.
  • (+) flush() before sync_all() is correct. Making sync_all best-effort (let _ =) is a pragmatic choice for filesystems that do not support fsync.
  • (+) Three well-designed regression tests: no stray .tmp on success, stale .tmp from prior crash overwritten, and original file preserved on write failure.
  • (note) The .tmp filename is deterministic (messages.jsonl.tmp). If two writers ever bypassed the per-thread Arc<Mutex<()>> lock, they would collide. The existing lock makes this unlikely, but using tempfile::NamedTempFile in the same directory would be marginally safer. Not a blocker.
  • (note) A maintainer has asked about Windows testing. The author confirms it has been tested on Windows.

Recommendation: can merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

bug: prevent thread message-store

2 participants