Add snapshot support for point-in-time database copy by psrvere · Pull Request #4346 · tursodatabase/turso

psrvere · 2025-12-24T13:25:04Z

This PR addresses feedback from tursodatabase/agentfs#119 suggesting that snapshot functionality should be implemented in turso.git using direct file copying rather than SQL-based row by row copying. Comment

Implementation

TRUNCATE checkpoints the WAL
Takes a read lock on WAL before database file copy
Do above two in a loop till read_lock[0] is acquired (explanation below)
Does not support MVCC mode, for now

Other Additions

CLI's .clone uses this snapshot API (the older implementation was based on logical copy)
The new implementation does not support snapshot with in-memory database which is a BREAKING change for this command.
Adds snapshot API to sdk-kit
Adds snapshot API to Rust SDK

Race Condition Handling

There is a race condition between TRUNCATE checkpoint and copying the database file in which another snapshot can run and we may end up copying a corrupted db file. To prevent this, the implementation runs a retry loop to do two things: 1) TRUNCATE Checkpoint, and 2) Take read_lock[0]. Since checkpoint flow need this lock to process, taking this lock prevents any other checkpoint.

Other Approaches I Considered

1. Acquire any read lock before Checkpointing

Although, from first read of the wal.rs code it looks like that checkpointing requires acquiring read locks in 1..N slots

/// Database checkpointers takes the following locks, in order:
/// The exclusive CHECKPOINTER lock.
/// The exclusive WRITER lock (FULL, RESTART and TRUNCATE only).
/// Exclusive lock on read-mark slots 1-N. These are immediately released after being taken.
/// Exclusive lock on read-mark 0.
/// Exclusive lock on read-mark slots 1-N again. These are immediately released after being taken (RESTART and TRUNCATE only).
/// All of the above use blocking locks.
impl CheckpointLocks {
    fn new(ptr: Arc<RwLock<WalFileShared>>, mode: CheckpointMode) -> Result<Self> { ... }

More specifically, these read lock are attempted to acquire in determine_max_safe_checkpoint_frame function which doesn't really enforce acquiring them -- if a slot is busy, it simply lowers the safe checkpoint boundary.

Hence holding non-zero read lock does not prevent from checkpoints to run.

2. FULL vs TRUNCATE Checkpoint

The flow is same for FULL, RESTART and TRUNCATE checkpoint in terms of acquiring locks and back-filling entire WAL, so this didn't help.

Assisted by: Cursor + Claude

turso-bot

Please review @jussisaurio

sivukhin

I have some suggestion and concerns regarding current tsnapshot logic.
Also, it will be great to use new snapshotting logic in the .clone CLI command

sivukhin · 2025-12-26T08:47:56Z

-                    res.release_guard();
+                    // Release checkpoint guard if lock is not to be kept
+                    if !keep_lock {
+                        res.release_guard();


I don't think that checkpoint guarantee that guard always will be set.

There are cases (for example, empty WAL - so nothing to checkpoint) - where checkpoint will return without holding any lock.

So, this fix do not prevent race condition situation in all situations, IIUC.

Yeah you are right. I didn't want to touch the checkpoint function in the first place.

sivukhin · 2025-12-26T08:50:15Z

+        let pager = self.pager.load();
+        let result = (|| -> Result<()> {
+            // Checkpoint and keep the lock
+            let _res = pager.blocking_checkpoint_keep_lock(


Can we instead of patching checkpoint do snapshot like this:

Execute truncate checkpoint

Start read transaction (either explicitly with BEGIN or maybe by just issuing read_tx on WAL)

Do the copy of DB file

End read transaction

I like this solution. I will implement this.

Although it does leave a tiny race window between truncating file and taking read lock where a writer can write and snapshot is taken. Maybe we can live with this right now. I will leave a detailed comment just in case we see a bug later on.

tiny race window between truncating file and taking read lock where a writer can write and snapshot is taken

Ah, yeah. The problem here actually is not in writes specifically - but in a checkpoint that can happen in the middle.

Alternative suggestion from me is to start read transaction before checkpoint but issue FULL checkpoint instead of TRUNCATE. This will make it possible for checkpoint to succeed - but also we will hold WAL and prevent any further checkpoint from happening (even if writes will happen).
Also, we should be careful with deferred nature of transaction and maybe we need to execute something after BEGIN statement in order to properly initiate read transaction.

As we are writing database - its better to be safe than sorry :) So even tiny race window is bad actually.

@sivukhin - apologies for late reply. I was out with viral last week.

I thought about your suggestion but this also doesn't work. I have finally found another approach which works. I have detailed everything in the PR Note. Also, PR is ready for review.

sivukhin

Also, @psrvere, do you plan to add support of snapshot method in sdk-kit/Rust SDK later?

psrvere · 2025-12-26T19:18:46Z

Thanks for the feedback @sivukhin.
I will update the .clone CLI command to use new snapshot logic and update SDK too in this PR. Will ping again once PR is ready to review.

This addresses feedback from tursodatabase/agentfs#119 suggesting that snapshot functionality should be implemented in turso.git using direct fily copying rather than SQL-based row by row copying. Comment: https:://github.com/tursodatabase/agentfs/pull/119#issuecomment-3681336678 - extract core logic from Pager::checkpoint function to a new Pager::checkpoing_internal function and add a flag to keep_lock during the Finalize phase - create wrapper functions Pager::checkpoint, Pager::checkpoint_with_lock and Pager::block_checkpoint_keep_lock - add Connection::snapshot API that checkpoints while keeping the lock and copies the database file The lock is held to avoid a race condition after finishing checkpointing and before copying the file when concurrent writers can write to db. Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

The previous implementation attempted to hold checkpoint locks during file copy by adding a keep_lock param to the checkpoint function. However, checkpoint can have multiple early exit paths like empty WAL where no lock is acquired. Also, it clutters the checkpoint function. This implementation executes TRUNCATE checkpoint and then acquires read lock on WAL before copying database file. This ensures new data is written to WAL and new Checkpoint can not be taken before this read lock is released. Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

In this window, another checkpoint can run and we may end with corrupted db file To prevent this, we earlier tried to take a read lock just after checkpointing but that had two problems: 1) It still left a tinier window for another checkpointing to start, and 2) it doesn't guarantee slot 0 on the taken read lock and hence doesn't guarantee to prevent other checkpointing. In this commit, we TRUNCATE checkpoint and try to take slot 0 read lock in loop to handle this race condition Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

- fixed cli tests for the new implementation - commented ApplyWritter's unused functions, used by older snapshot implementation in cli - removed unused imports - added missing Doc strings to .timer and .clone commands Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

psrvere · 2026-01-07T09:21:52Z

Maintainers - Please review this PR.

psrvere · 2026-01-13T14:19:34Z

@sivukhin - please look at it when you get time.

avinassh · 2026-02-14T14:28:26Z

@psrvere could you rebase with main and solve the conflicts

penberg · 2026-03-17T00:21:14Z

This pull request has been marked as stale due to inactivity. It will be closed in 7 days if no further activity occurs.

penberg · 2026-03-25T00:21:42Z

This pull request has been closed due to inactivity. Please feel free to reopen it if you have further updates.

github-actions Bot added core Storage labels Dec 24, 2025

psrvere force-pushed the snapshot-file-copy branch from 65a4d56 to 27a9bcc Compare December 24, 2025 13:32

psrvere marked this pull request as ready for review December 24, 2025 14:53

psrvere requested review from jussisaurio, penberg and pereman2 as code owners December 24, 2025 14:53

turso-bot Bot reviewed Dec 24, 2025

View reviewed changes

psrvere force-pushed the snapshot-file-copy branch 2 times, most recently from faa0e3e to 510cb7b Compare December 24, 2025 15:00

psrvere mentioned this pull request Dec 24, 2025

Feature: Add Snapshot functionality to Rust SDK and CLI tursodatabase/agentfs#119

Draft

penberg changed the title ~~Feature: Add Connection::snapshot() for point-in-time database copy~~ Add connection snapshot support for point-in-time database copy Dec 25, 2025

sivukhin requested changes Dec 26, 2025

View reviewed changes

sivukhin reviewed Dec 26, 2025

View reviewed changes

psrvere force-pushed the snapshot-file-copy branch from 525d2ec to 7e71044 Compare December 26, 2025 19:56

psrvere changed the title ~~Add connection snapshot support for point-in-time database copy~~ Feature: Add Snapshot for point-in-time database copy Dec 26, 2025

github-actions Bot added the cli label Dec 26, 2025

penberg changed the title ~~Feature: Add Snapshot for point-in-time database copy~~ Add snapshot support for point-in-time database copy Jan 3, 2026

psrvere added 5 commits January 5, 2026 21:14

Add tests for Connection::snapshot API

0b4ee48

Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

Use snapshot function in the CLI's .clone command

df004b2

Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

psrvere force-pushed the snapshot-file-copy branch from 943bdbd to e8d3b55 Compare January 5, 2026 15:44

psrvere added 2 commits January 5, 2026 21:28

Added snapshot API to sdk-kit

ae8bf19

Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

psrvere force-pushed the snapshot-file-copy branch from 7ecd179 to ba1f438 Compare January 5, 2026 17:07

Added snapshot API to rust sdk

6d4ca8c

Signed-off-by: Prateek Singh Rathore <prateek.singh.rathore@gmail.com>

github-actions Bot added the rust-bindings label Jan 5, 2026

psrvere requested a review from sivukhin January 5, 2026 17:17

penberg added the Stale label Mar 17, 2026

penberg closed this Mar 25, 2026

Conversation

psrvere commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Other Additions

Race Condition Handling

Other Approaches I Considered

1. Acquire any read lock before Checkpointing

2. FULL vs TRUNCATE Checkpoint

Uh oh!

turso-bot Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sivukhin left a comment

Choose a reason for hiding this comment

Uh oh!

sivukhin Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

psrvere Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

sivukhin Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

psrvere Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sivukhin Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

psrvere Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

sivukhin left a comment

Choose a reason for hiding this comment

Uh oh!

psrvere commented Dec 26, 2025

Uh oh!

psrvere commented Jan 7, 2026

Uh oh!

psrvere commented Jan 13, 2026

Uh oh!

avinassh commented Feb 14, 2026

Uh oh!

penberg commented Mar 17, 2026

Uh oh!

penberg commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

psrvere commented Dec 24, 2025 •

edited

Loading

psrvere Dec 26, 2025 •

edited

Loading