Skip to content

Runaway replication for a database with no changes and failures to restore with v0.5.9 #1175

@moll

Description

@moll

Hey,

Thanks for maintaining Litestream!

I'm seeing very weird behavior with replicating a ~200MiB database to an SFTP server with Litestream v0.5.9. Started from scratch — so no Litestream cache/LTX on either the client nor server. From what I can tell, it just constantly transmits something with LTX 1 increasing in size, yet there are no changes to the source database. I also tried v0.5.8, but since that has bugs around detecting changes if the WAL file's size doesn't change, that doesn't replicate anything, so that's not much of a test. :)

About every second, I'm seeing the 1/0000000000000001-0000000000000001.ltx file increase by ~6KiB on the SFTP server:

$ tree -sacD
[       4096 Mar  2 04:25]  ./
└── [       4096 Mar  2 03:52]  ltx/
    ├── [       4096 Mar  2 03:52]  0/
    │   └── [  122302671 Mar  2 03:53]  0000000000000001-0000000000000001.ltx
    ├── [       4096 Mar  2 03:52]  9/
    │   └── [   91329968 Mar  2 04:25]  0000000000000001-0000000000000001.ltx
    └── [       4096 Mar  2 03:52]  1/
        └── [   91001738 Mar  2 04:25]  0000000000000001-0000000000000001.ltx

The source database hasn't had any changes for 2h now (at least its and the WAL's mtime is ~2h ago)

Trying to restore the database on the SFTP server machine with v0.5.9, results in failure:

$ litestream restore -txid 0000000000000001 -o foo.sqlite3 file:///srv/foo-backup
time=2026-03-02T04:30:14.358Z level=ERROR msg="failed to run" error="decode database: decode page 43743: close reader 0: cannot close, expected page"

Every invocation of restore has different decode page numbers. It results in a foo.sqlite3.tmp file with currently 172MiB in size.

All this could be related to #1165 or #1171, or could also not be. I guess their problem was just runaway size use, but I'm seeing both that and restore failure.

Any suggestion where to look next? I've yet to try -trace as I saw its existence mentioned only on this GitHub issue template. I've also seen plenty of shutting down hangs (#399), but that's probably a separate issue. The logs below are from a clean start with cleared caches/replica.

Thanks!

Environment

Litestream version:
v0.5.9

Operating system & version:
Ubuntu 24.04

Installation method:
Binary from GitHub

Storage backend:
SFTP

Configuration

litestream.yml
sync-interval: 1s
shutdown-sync-timeout: 5s
truncate-page-n: 0

snapshot:
  interval: 168h
  retention: 168h

socket:
  enabled: true
  path: "${XDG_RUNTIME_DIR}/litestream.sock"

dbs:
  - path: /srv/foo.sqlite3
    replica:
      url: sftp://foo@bar/foo-backup
      concurrent-writes: false
      key-path: xxx

Logs

Log output
 litestream.service - Litestream.
time=2026-03-02T03:51:57.877Z level=INFO msg=litestream version=0.5.9 level=info
time=2026-03-02T03:51:57.878Z level=INFO msg="starting compaction monitor" level=2 interval=5m0s
time=2026-03-02T03:51:57.878Z level=INFO msg="starting compaction monitor" level=1 interval=30s
time=2026-03-02T03:51:57.878Z level=INFO msg="starting compaction monitor" level=3 interval=1h0m0s
time=2026-03-02T03:51:57.878Z level=INFO msg="starting compaction monitor" level=9 interval=168h0m0s
time=2026-03-02T03:51:57.878Z level=INFO msg="control socket listening" path=/run/litestream/litestream.sock
time=2026-03-02T03:51:57.878Z level=INFO msg="initialized db" path=/srv/foo.sqlite3
time=2026-03-02T03:51:57.878Z level=INFO msg="replicating to" type=sftp sync-interval=1s host=xxx user=xxx path=/foo-backup
time=2026-03-02T03:51:57.878Z level=INFO msg="starting L0 retention monitor" interval=15s retention=5m0s

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingduplicateDuplicate of another issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions