You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
POST /db/:col/txn (added in this branch) is in-memory atomic under stripe locks: concurrent readers never observe a partial txn, and validation runs before any write. But it is not crash-atomic.
Current sequence inside Collection.applyTxn:
Acquire all stripe locks (sorted, deadlock-free)
Validate every op against current state + intra-batch dup check
For each op: write its individual WAL record (doc_insert / doc_update / doc_delete), update in-memory indexes
Release locks
If the process crashes between step 3.a and 3.b — say, after the first WAL record is written but before the second — WAL replay on restart applies the partial state. The reader-visible invariant (no torn batch) is gone after a crash.
TigerBeetle gets crash-atomicity by interleaving consensus with checkpointing; we don't have consensus, but we do have a single-node WAL we can wrap with markers.
Proposed solution
Add two new WAL op types:
OpType.txn_begin — payload is the txn id (process-monotonic u64, picked when applyTxn starts)
OpType.txn_commit — payload is the same txn id
applyTxn becomes:
write WAL: {.txn_begin, txn_id}
for each op: write WAL: {.doc_*, ..., txn_id} (extend record format with optional txn_id)
write WAL: {.txn_commit, txn_id}
update in-memory indexes
WAL replay (existing WAL.replay in src/storage/wal.zig):
collect all records into a list
for each txn_id, check we have a matching commit marker
if begin without commit: skip ALL doc records carrying that txn_id
Acceptance criteria
Collection.applyTxn writes a txn_begin before, txn_commit after
WAL record format carries an optional txn_id (0 = non-txn legacy write)
WAL replay skips uncommitted txn ops on restart
New test: insert 2 docs in a txn, kill -9 the process between WAL write and commit, restart, verify neither doc is reachable
No perf regression on single-op insert path (txn_id=0 fast-path)
Out of scope
Cross-collection txn (would need cross-stripe-set lock ordering across collections)
Distributed txn (no consensus protocol on this branch)
References
src/collection.zigapplyTxn — current in-memory atomic impl
src/storage/wal.zigWAL.write / WAL.replay — where the markers and the replay filter live
Background
POST /db/:col/txn(added in this branch) is in-memory atomic under stripe locks: concurrent readers never observe a partial txn, and validation runs before any write. But it is not crash-atomic.Current sequence inside
Collection.applyTxn:doc_insert/doc_update/doc_delete), update in-memory indexesIf the process crashes between step 3.a and 3.b — say, after the first WAL record is written but before the second — WAL replay on restart applies the partial state. The reader-visible invariant (no torn batch) is gone after a crash.
TigerBeetle gets crash-atomicity by interleaving consensus with checkpointing; we don't have consensus, but we do have a single-node WAL we can wrap with markers.
Proposed solution
Add two new WAL op types:
OpType.txn_begin— payload is the txn id (process-monotonic u64, picked whenapplyTxnstarts)OpType.txn_commit— payload is the same txn idapplyTxnbecomes:WAL replay (existing
WAL.replayinsrc/storage/wal.zig):Acceptance criteria
Collection.applyTxnwrites atxn_beginbefore,txn_commitaftertxn_id(0 = non-txn legacy write)Out of scope
References
src/collection.zigapplyTxn— current in-memory atomic implsrc/storage/wal.zigWAL.write/WAL.replay— where the markers and the replay filter liveparity/divergence.pytest ZagDB: Implement sign.zig — Ed25519 package signing #6 — already verifies in-memory atomic batchsrc/vsr/journal.zig(durable prepare records)