-
Notifications
You must be signed in to change notification settings - Fork 314
Description
Environment
Litestream version:
0.5.0
Operating system & version:
Debian GNU/Linux forky
Installation method:
binary download, built from source
Storage backend:
file
Bug Description
This may be way too early, but I was trying to "port" the (just released) read-replica VFS to my SQLite driver, and I've bumped into a few issues, which may cause you future pain, so I decided to open this to discuss them here.
If it is too early, feel free to ignore me and close the issue, I genuinely just wanted to help.
Hopefully this reads "helpful" and not "demanding".
VFS Open
Open
is trying to handle all file types using the ReplicaClient
. This may seem that it mostly works but, depending on settings, SQLite will eventually try to use the VFS to open temporary files.
IIUC, for Litestream, it should be sufficient to open SQLITE_OPEN_MAIN_DB
files and refuse everything else.
However, unless the user is instructed to configure SQLite to use memory for all temporary files, SQLite will eventually try to at least SQLITE_OPEN_TEMP_JOURNAL
to handle large sorts (such as an ORDER BY
of millions of records). Small tests won't surface this, though.
Also, Open
can probably or SQLITE_OPEN_READONLY
to the flags it returns.
Cancel polling replication
This is probably an oversight, wg
, ctx
, and cancel
are all there to handle this, but unused.
Handle sub-page reads
Since you're not returning SQLITE_IOCAP_SUBPAGE_READ
, SQLite 3.48.0 and above will mostly avoid sub-page reads (otherwise, overflow BLOBs would use them). However, SQLite will use sub-page to read the database header in the first page, and your code here is buggy.
SQLite will read 100 bytes at offset 0 to load the entire header and, when in rollback mode, will read 16 bytes at offset 24 to check for changes.
When reading the first page of the database you're setting the file format version numbers to 1 which forces rollback mode, and may be one way of handling a readonly VFS that gets updates from other writers.
However, when SQLite reads the 16 bytes at offset 24, you're returning the wrong data. This line doesn't copy the correct bytes:
Line 193 in 4e7ca3d
n = copy(p, data) |
You'd need to update it to n = copy(p, data[off%4096:])
to "fix" that, but if you try, you'll start missing updates, because the file change counter is not updated on every transaction in WAL mode (but is expected to in rollback mode).
Correctly handling updates
My suggestion for a simple fix for the above issue is this:
// Update the first page to pretend like we are in journal mode,
// and there were changes to the database.
if pgno == 1 {
data[18], data[19] = 0x01, 0x01
rand.Read(data[24:28])
}
n = copy(p, data[off%4096:])
f.logger.Info("data read", "n", n, "data", len(data))
However, this means SQLite will drop it's entire page cache at the start of every transaction, which is sub-optimal if there are no changes. You should probably only update when there are actual changes.
Correctly handling locking, "concurrent" writes
There's another issue where I probably don't really understand Litestream and LTX well enough to help much, but how do you plan to implement locking?
IIUC, with the above fix things will probably work with small tests (which update a single page), but will start failing at anything more complicated.
In theory the correct way to do things is "snapshot" the database when Lock(LockShared)
is called, and ensure all reads are satisfied from that "snapshot" until Unlock(LockNone)
is called. Between those, maybe you don't even need to "poll" the replica?
I tried to only pollReplicaClient()
on Lock(LockShared)
(and remove the background goroutine) and it seems to work fine. Then I can also update the change counter if there are any itr.Next()
, remove mutexes, etc.