Skip to content

Conversation

@corylanou
Copy link
Collaborator

Summary

  • Adds a -f flag to the restore command that continuously polls for and applies new LTX files after the initial restore, similar to tail -f
  • Enables maintaining read-only database replicas that stay in sync with the source via incremental LTX file application
  • Falls back to higher compaction levels when level 0 files have been compacted away, handles database truncation (VACUUM), and exits cleanly on Ctrl+C

Closes #1065

Implementation Details

db.go: Added Follow and FollowInterval fields to RestoreOptions with DefaultFollowInterval (1s).

replica.go: Added validation (follow is incompatible with -txid/-timestamp), and four new methods:

  • follow() - Main poll loop that opens the restored DB, reads page size, and ticks at the configured interval
  • applyNewLTXFiles() - Polls level 0 for new files, handles gaps via fillFollowGap()
  • applyLTXFile() - Applies a single LTX file's pages to the database (follows Hydrator.ApplyLTX pattern)
  • fillFollowGap() - Searches compaction levels 1-8 to bridge gaps when L0 files are compacted

cmd/litestream/restore.go: Added -f flag, signal handling for clean Ctrl+C shutdown, usage docs and example.

Usage

# Continuously restore (follow) a database from a replica
litestream restore -f -o /tmp/read-replica.db s3://mybucket/db

Important: The restored database should only be opened in read-only mode by consumers.

Test plan

  • TestReplica_Restore_Follow_IncompatibleFlags - Verifies error when Follow + TXID or Follow + Timestamp
  • TestReplica_Restore_Follow - Integration test: creates source DB, syncs to file replica, restores with follow, inserts new data, verifies follow applies it, cancels cleanly
  • TestReplica_Restore_Follow_ContextCancellation - Verifies immediate cancel returns nil (clean shutdown)
  • All existing tests pass with race detector
  • Pre-commit hooks pass (go-imports, go-vet, staticcheck)

🤖 Generated with Claude Code

corylanou and others added 2 commits February 11, 2026 11:27
Add a -f flag to the restore command that continuously polls for and
applies new LTX files after the initial restore completes, similar to
tail -f. This enables maintaining read-only database replicas that stay
in sync with the source.

The follow mode:
- Polls level 0 for new LTX files at a configurable interval (default 1s)
- Falls back to higher compaction levels when L0 files have been compacted
- Handles database truncation (e.g., after VACUUM)
- Exits cleanly on Ctrl+C via context cancellation
- Is incompatible with -txid and -timestamp flags

Closes #1065

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@corylanou
Copy link
Collaborator Author

Design note on approach vs alternatives:

We intentionally kept this change as a targeted hardening of the new follow path instead of rewriting restore selection logic wholesale.

Why this was needed:

  • One-shot restore already existed, but follow mode introduces a moving target.
  • Compaction can produce overlapping files (e.g. 100-200) that still cover a follower cursor (e.g. 150), so seeking only from cursor+1 in higher levels can miss valid data.
  • We also needed follow polling to treat iterator/list failures as real errors (not "no updates").
  • Incremental apply now closes the decoder so trailer/file checksum validation is enforced before treating the update as successful.

What this patch does:

  • Keeps existing L0-first behavior.
  • Adds explicit higher-level gap-fill behavior that can see overlapping compacted files.
  • Falls back to higher levels when L0 is empty.
  • Surfaces iterator errors/close failures.
  • Verifies checksum via dec.Close() in apply path.

Possible alternatives if we want a different long-term shape:

  1. Unify selection into a single "next covering file" algorithm across levels:
    • pick file where MinTXID <= cursor+1 and MaxTXID > cursor,
    • choose the candidate that advances furthest,
    • repeat until no progress.
      This can remove most special-case gap-fill logic.
  2. Persist follower state as TXID+checksum (not TXID only) so restart/reconciliation can detect divergence earlier and resume more deterministically.
  3. Add an explicit abstraction for follow planning (like CalcRestorePlan but incremental), so rules are easier to reason about and test independently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(restore) Continuous restore from file replica to restored database file?

1 participant