Read-replica VFS issues.

## Environment

**Litestream version:**

```text
0.5.0
```

**Operating system & version:**
Debian GNU/Linux forky

**Installation method:**
binary download, built from source

**Storage backend:**
file

## Bug Description

This may be way too early, but I was trying to "port" the (just released) [read-replica VFS](https://github.com/benbjohnson/litestream/blob/v0.5.0/vfs.go) to my [SQLite driver](https://github.com/ncruces/go-sqlite3/tree/litestream/litestream), and I've bumped into a few issues, which may cause you future pain, so I decided to open this to discuss them here.

If it _is_ too early, feel free to ignore me and close the issue, I genuinely just wanted to help.
Hopefully this reads "helpful" and not "demanding".

### VFS Open

`Open` is trying to handle all file types using the `ReplicaClient`. This may seem that it mostly works but, depending on settings, SQLite will eventually try to use the VFS to open temporary files.

IIUC, for Litestream, it should be sufficient to open [`SQLITE_OPEN_MAIN_DB`](https://www.sqlite.org/c3ref/c_open_autoproxy.html) files and refuse everything else.

However, unless the user is instructed to configure SQLite to use memory for all temporary files, SQLite will eventually try to at least `SQLITE_OPEN_TEMP_JOURNAL` to handle large sorts (such as an `ORDER BY` of millions of records). Small tests won't surface this, though.

Also, `Open` can probably or `SQLITE_OPEN_READONLY` to the flags it returns.

### Cancel polling replication

This is probably an oversight, `wg`, `ctx`, and `cancel` are all there to handle this, but unused.

### Handle sub-page reads

Since you're not returning [`SQLITE_IOCAP_SUBPAGE_READ`](https://www.sqlite.org/c3ref/c_iocap_atomic.html), SQLite 3.48.0 and above will mostly avoid sub-page reads (otherwise, overflow BLOBs would use them). However, SQLite **will** use sub-page to read the database header in the first page, and your code here is buggy.

SQLite will read 100 bytes at offset 0 to load the entire header and, when in rollback mode, **will read 16 bytes at offset 24 to check for changes**.

When reading the first page of the database you're [setting](https://github.com/benbjohnson/litestream/blob/v0.5.0/vfs.go#L196) the [file format version numbers](https://www.sqlite.org/fileformat.html#file_format_version_numbers) to 1 which forces rollback mode, and may be one way of handling a readonly VFS that gets updates from other writers.

However, when SQLite reads the 16 bytes at offset 24, **you're returning the wrong data**. This line doesn't copy the correct bytes:
https://github.com/benbjohnson/litestream/blob/4e7ca3d82c2c018ac20a4af385af3a3fd9e52183/vfs.go#L193

You'd need to update it to `n = copy(p, data[off%4096:])` to "fix" that, but if you try, you'll start missing updates, because the [file change counter](https://www.sqlite.org/fileformat.html#file_change_counter) is not updated on every transaction in WAL mode (but is expected to in rollback mode).

### Correctly handling updates

My suggestion for a simple fix for the above issue is this:
```go
// Update the first page to pretend like we are in journal mode,
// and there were changes to the database.
if pgno == 1 {
	data[18], data[19] = 0x01, 0x01
	rand.Read(data[24:28])
}

n = copy(p, data[off%4096:])
f.logger.Info("data read", "n", n, "data", len(data))
```

However, this means SQLite will drop it's entire page cache at the start of every transaction, which is sub-optimal if there are no changes. You should probably _only_ update when there are actual changes.

### Correctly handling locking, "concurrent" writes

There's another issue where I probably don't really understand Litestream and LTX well enough to help much, but how do you plan to implement locking?

IIUC, with the above fix things will _probably_ work with small tests (which update a single page), but will start failing at anything more complicated.

In theory the correct way to do things is "snapshot" the database when `Lock(LockShared)` is called, and ensure all reads are satisfied from that "snapshot" until `Unlock(LockNone)` is called. Between those, maybe you don't even need to "poll" the replica?

I tried to only `pollReplicaClient()` on `Lock(LockShared)` (and remove the background goroutine) and it seems to work fine. Then I can also update the change counter if there are any `itr.Next()`, remove mutexes, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read-replica VFS issues. #772

Environment

Bug Description

VFS Open

Cancel polling replication

Handle sub-page reads

Correctly handling updates

Correctly handling locking, "concurrent" writes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Read-replica VFS issues. #772

Description

Environment

Bug Description

VFS Open

Cancel polling replication

Handle sub-page reads

Correctly handling updates

Correctly handling locking, "concurrent" writes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions