Locking and requirements for applications #98

dghubble · 2021-03-02T02:59:32Z

dghubble
Mar 2, 2021

Thanks for such a neat project!

I recently added litestream to an existing Drone (single instance Go app) setup in the home lab and wanted to ask about some things I learned. For background, Drone (the "app") runs in a container with a local SQLite db, originally backed by a Kubernetes persistent disk, but now able to just use a litestream init container and sidecar to restore from S3 and replicate to S3. Since litestream was added, the app can perform all its duties (serve auth'd UI, run some pipelines), but many actions fail at random unless manually retried.

The app quirks (e.g. user is logged out, job didn't start) match app debug logs showing operations that fail because "database is locked". Disabling litestream replicate alleviates these, as does rolling back the persistent disk.

So my question is about locking behaviors and the expectations on applications (discussed a bit in #58). Ideally, apps should be retrying operations. But I suspect many single-process apps don't expect other locks. I found litestream takes out a write lock at one point with a note and suspected it could be a factor, but I don't know the background. Is it fundamental that litestream write-lock?

Basically, the use case I'm curious about is for apps that maybe don't do SQLite retries well, can litestream be as stealthy as possible, grab the lock infrequently, not use a write lock, etc. Or I'd tradeoff "realtime" for "just sync every few minutes is fine", if it meant the application was less aware its SQLite was being touched.

PS: I've explored some other avenues like recompiling Drone with a recent mattn/go-sqlite3 to match litestream's in case there are missing fixes, but no difference. And manually chekcing SQLite is in WAL mode and that locking mode is normal, not exclusive. But for now I've rolled back to using a persistent disk, the quirks from lock contention were too much.

Answered by benbjohnson

Mar 2, 2021

Hi @dghubble, thanks for the feedback. The locking mechanism turns out to be a historical artifact of how Litestream worked originally and I should be able to remove it. I wrote up an issue for it (#99) but the tl;dr is that Litestream relied on SQLite to validate transaction boundaries in the WAL originally but that validation has been moved into Litestream so the lock isn't necessary anymore.

I added the issue to the v0.3.3 release that I'm hoping to get out by the end of the week. That should make for a much more pleasant experience with applications that don't use a busy timeout.

For mattn/go-sqlite3, you can specify a _busy_timeout=1000 in the connection string and that should fix it…

View full answer

benbjohnson · 2021-03-02T15:06:18Z

benbjohnson
Mar 2, 2021
Maintainer

Hi @dghubble, thanks for the feedback. The locking mechanism turns out to be a historical artifact of how Litestream worked originally and I should be able to remove it. I wrote up an issue for it (#99) but the tl;dr is that Litestream relied on SQLite to validate transaction boundaries in the WAL originally but that validation has been moved into Litestream so the lock isn't necessary anymore.

I added the issue to the v0.3.3 release that I'm hoping to get out by the end of the week. That should make for a much more pleasant experience with applications that don't use a busy timeout.

For mattn/go-sqlite3, you can specify a _busy_timeout=1000 in the connection string and that should fix it. Litestream's locks are typically only obtained for ~10ms so a 1000ms timeout should be more than enough.

Finally, there is a DB.MonitorInterval that defaults to 1s but it's not exposed to the configuration right now. I added an issue for that as well (#100). That monitor interval will likely go away in the future once the implementation is changed out to use fsnotify though.

8 replies

dghubble Mar 7, 2021
Author

Ok, thanks for the heads up. I tested v0.3.3-rc0 a bit. The database locked log messages disappeared, but the random application issues kept up. Lowering monitor-interval to 60s reduced the likelihood. It seems like I may need to dig through Drone's source at some point, the write lock must not be the whole story.

benbjohnson Mar 7, 2021
Maintainer

Thanks for testing the RC. There’s another write lock that occurs right after checkpointing still so maybe that’s the issue.

Do you have some info on how you have Drone and Litestream configured? I’d like to try setting it up locally and testing it out.

dghubble Mar 7, 2021
Author

Happy to share manifests, https://gist.github.com/dghubble/aee9bec29a78832c23589772a9b1014d. It looks like you're aiming to have a Kubernetes guide in future, so maybe they're helpful.

benbjohnson Mar 7, 2021
Maintainer

Thanks! Can you try setting this environment variable for Drone? It would help isolate whether it's a write lock issue.

DRONE_DATABASE_DATASOURCE=/data/database.sqlite?_busy_timeout=5000

dghubble Mar 8, 2021
Author

With replication on, I can still click around and random requests will fail. Also, I had to include file: for it to start.

DRONE_DATABASE_DATASOURCE="file:/data/database.sqlite?_busy_timeout=5000"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Locking and requirements for applications #98

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 8 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Locking and requirements for applications #98

Uh oh!

dghubble Mar 2, 2021

Replies: 1 comment · 8 replies

Uh oh!

benbjohnson Mar 2, 2021 Maintainer

Uh oh!

dghubble Mar 7, 2021 Author

Uh oh!

benbjohnson Mar 7, 2021 Maintainer

Uh oh!

Uh oh!

dghubble Mar 7, 2021 Author

Uh oh!

benbjohnson Mar 7, 2021 Maintainer

Uh oh!

dghubble Mar 8, 2021 Author

dghubble
Mar 2, 2021

Replies: 1 comment 8 replies

benbjohnson
Mar 2, 2021
Maintainer

dghubble Mar 7, 2021
Author

benbjohnson Mar 7, 2021
Maintainer

dghubble Mar 7, 2021
Author

benbjohnson Mar 7, 2021
Maintainer

dghubble Mar 8, 2021
Author