Skip to content

Commit 080de68

Browse files
russellromneyclaude
andcommitted
feat: Explicit mode selection, no silent fallback
Extension registers both VFS names: - "turbolite" — always registered, local compressed VFS - "turbolite-s3" — registered only when TURBOLITE_BUCKET is set, fails hard if config is invalid Python API: turbolite.connect(path, mode="s3", bucket=...) validates config before loading. ValueError if mode="s3" with no bucket. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent aee4a08 commit 080de68

3 files changed

Lines changed: 112 additions & 61 deletions

File tree

README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,10 @@ pip install turbolite
4040
```
4141

4242
```python
43-
import sqlite3
4443
import turbolite
4544

46-
# convenience wrapper for manually loading the extension
45+
# Local compressed database
4746
conn = turbolite.connect("my.db")
48-
49-
# execute queries
5047
conn.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, email TEXT)")
5148
conn.execute("INSERT INTO users VALUES (1, 'alice', 'alice@example.com')")
5249
conn.commit()
@@ -56,15 +53,22 @@ print(alice[1])
5653
>>> "alice"
5754
```
5855

59-
You can also manually load the extension:
56+
```python
57+
# S3 tiered database — serve cold queries from S3
58+
conn = turbolite.connect("my.db", mode="s3",
59+
bucket="my-bucket",
60+
endpoint="https://t3.storage.dev")
61+
```
6062

6163
```python
62-
# Load the extension (registers the "turbolite" VFS process-wide)
64+
# Or manually load the extension for full control
65+
import sqlite3
6366
conn = sqlite3.connect(":memory:")
6467
turbolite.load(conn)
6568
conn.close()
6669

67-
conn = sqlite3.connect("file:my.db?vfs=turbolite", uri=True)
70+
conn = sqlite3.connect("file:my.db?vfs=turbolite", uri=True) # local
71+
conn = sqlite3.connect("file:my.db?vfs=turbolite-s3", uri=True) # S3 (needs TURBOLITE_BUCKET)
6872
```
6973

7074
See (installation details below) for Node, Go, Rust, and using the `.so` loadable extension directly
@@ -95,7 +99,7 @@ SQLite: "read page 4,271"
9599
Manifest: "page 4,271 is in group 16, sub-chunk 3, bytes 81,920-106,496"
96100
|
97101
v
98-
turbolite check local storage: "page not in cache"
102+
turbolite: check local storage -> cache miss
99103
|
100104
v
101105
S3: GET for ~256KB compressed sub-chunk → decompress → return page
@@ -151,7 +155,9 @@ let config = TieredConfig {
151155

152156
**Encrypt after compress, decrypt before decompress.** Compression operates on plaintext (compressing ciphertext is useless). On the S3 path: `plaintext → zstd compress → GCM encrypt → S3 PUT`. On read: `S3 GET → GCM decrypt → zstd decompress → plaintext`.
153157

154-
**Security model:** S3 data uses authenticated encryption (GCM) with unique nonces per frame - the strongest path for long-lived data. Local files use CTR with deterministic nonces (page number / byte offset), providing confidentiality against single-snapshot disk-at-rest attackers. CTR's deterministic nonces mean that an attacker who can capture multiple snapshots of the same file (e.g., via filesystem snapshots or NVMe wear leveling) could potentially recover XOR of plaintexts at reused offsets. This matches the trade-off made by SQLite's own SEE extension in OFB mode. For most deployments, the local cache is ephemeral and recreatable from S3, so this is acceptable.
158+
**Security model:** S3 data uses GCM with unique nonces per frame (authenticated, tamper-detecting). Local files use CTR with deterministic nonces (page number / byte offset), providing confidentiality against disk-at-rest attackers. CTR's deterministic nonces mean multi-snapshot attackers could recover XOR of plaintexts at reused offsets, matching SQLite's own SEE extension tradeoff. The local cache is ephemeral and recreatable from S3.
159+
160+
**Key rotation:** `rotate_encryption_key(config, new_key)` re-encrypts, adds, or removes encryption on all S3 data without decompressing. `Some` to `Some` rotates keys, `Some` to `None` removes encryption, `None` to `Some` adds it. Crash-safe: old objects are never overwritten, the manifest upload is the atomic commit point, and a verification step confirms new data is readable before committing. Orphans from partial runs are cleaned by `gc()`.
155161

156162
## Strengths and Limitations
157163

Lines changed: 65 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,16 @@
11
"""
2-
turbolite — compressed SQLite for Python.
2+
turbolite — compressed SQLite with S3 tiered storage for Python.
33
4-
Usage::
4+
Local mode (default)::
55
6-
import sqlite3
76
import turbolite
7+
conn = turbolite.connect("my.db")
88
9-
conn = sqlite3.connect(":memory:")
10-
turbolite.load(conn)
11-
12-
# Open a compressed database via URI
13-
conn2 = sqlite3.connect("file:my.db?vfs=turbolite", uri=True)
14-
15-
Or use the convenience wrapper::
9+
S3 tiered mode::
1610
17-
conn = turbolite.connect("my.db")
11+
conn = turbolite.connect("my.db", mode="s3",
12+
bucket="my-bucket",
13+
endpoint="https://t3.storage.dev")
1814
"""
1915

2016
from __future__ import annotations
@@ -24,7 +20,7 @@
2420
import sqlite3
2521
import sys
2622

27-
__version__ = "0.1.0"
23+
__version__ = "0.2.19"
2824

2925

3026
def _find_ext() -> str:
@@ -41,7 +37,6 @@ def _find_ext() -> str:
4137

4238
path = os.path.join(pkg_dir, name)
4339
if os.path.isfile(path):
44-
# Return without extension — sqlite3.load_extension appends it
4540
return os.path.splitext(path)[0]
4641

4742
raise FileNotFoundError(
@@ -57,10 +52,9 @@ def load(conn: sqlite3.Connection) -> None:
5752
"""
5853
Load the turbolite extension into a sqlite3 connection.
5954
60-
After loading, the "turbolite" VFS is registered process-wide.
61-
Open compressed databases with::
62-
63-
sqlite3.connect("file:path.db?vfs=turbolite", uri=True)
55+
After loading, the "turbolite" VFS (local compressed) is always
56+
registered. If TURBOLITE_BUCKET is set in the environment,
57+
"turbolite-s3" (tiered S3) is also registered.
6458
6559
Args:
6660
conn: Any open sqlite3.Connection (can be :memory:).
@@ -75,28 +69,71 @@ def load(conn: sqlite3.Connection) -> None:
7569
def connect(
7670
path: str,
7771
*,
78-
vfs: str = "turbolite",
72+
mode: str = "local",
73+
bucket: str | None = None,
74+
prefix: str | None = None,
75+
endpoint: str | None = None,
76+
region: str | None = None,
77+
cache_dir: str | None = None,
78+
compression_level: int | None = None,
79+
prefetch_threads: int | None = None,
80+
read_only: bool = False,
7981
) -> sqlite3.Connection:
8082
"""
81-
Convenience: load the extension and open a compressed database.
82-
83-
Equivalent to::
84-
85-
conn = sqlite3.connect(":memory:")
86-
turbolite.load(conn)
87-
conn.close()
88-
return sqlite3.connect(f"file:{path}?vfs={vfs}", uri=True)
83+
Open a turbolite database.
8984
9085
Args:
9186
path: Path to the database file.
92-
vfs: VFS name (default "turbolite").
87+
mode: "local" for compressed VFS, "s3" for S3 tiered VFS.
88+
bucket: S3 bucket (required for mode="s3", or set TURBOLITE_BUCKET).
89+
prefix: S3 key prefix (default "turbolite").
90+
endpoint: S3 endpoint URL (Tigris, MinIO). Falls back to AWS_ENDPOINT_URL.
91+
region: AWS region. Falls back to AWS_REGION.
92+
cache_dir: Local cache directory (default /tmp/turbolite).
93+
compression_level: Zstd level 1-22 (default 3).
94+
prefetch_threads: Prefetch worker threads (default num_cpus + 1).
95+
read_only: Open in read-only mode.
9396
9497
Returns:
95-
An open sqlite3.Connection using the compressed VFS.
98+
An open sqlite3.Connection.
99+
100+
Raises:
101+
ValueError: If mode="s3" but no bucket is configured.
102+
RuntimeError: If the tiered VFS fails to initialize.
96103
"""
104+
if mode not in ("local", "s3"):
105+
raise ValueError(f"mode must be 'local' or 's3', got {mode!r}")
106+
107+
if mode == "s3":
108+
# Set env vars for the extension before loading.
109+
# Fail fast if bucket is missing.
110+
effective_bucket = bucket or os.environ.get("TURBOLITE_BUCKET")
111+
if not effective_bucket:
112+
raise ValueError(
113+
"mode='s3' requires a bucket. Pass bucket= or set TURBOLITE_BUCKET."
114+
)
115+
os.environ["TURBOLITE_BUCKET"] = effective_bucket
116+
if prefix is not None:
117+
os.environ["TURBOLITE_PREFIX"] = prefix
118+
if endpoint is not None:
119+
os.environ["TURBOLITE_ENDPOINT_URL"] = endpoint
120+
if region is not None:
121+
os.environ["TURBOLITE_REGION"] = region
122+
if cache_dir is not None:
123+
os.environ["TURBOLITE_CACHE_DIR"] = cache_dir
124+
if read_only:
125+
os.environ["TURBOLITE_READ_ONLY"] = "true"
126+
127+
if compression_level is not None:
128+
os.environ["TURBOLITE_COMPRESSION_LEVEL"] = str(compression_level)
129+
if prefetch_threads is not None:
130+
os.environ["TURBOLITE_PREFETCH_THREADS"] = str(prefetch_threads)
131+
97132
global _loaded
98133
if not _loaded:
99134
bootstrap = sqlite3.connect(":memory:")
100135
load(bootstrap)
101136
bootstrap.close()
137+
138+
vfs = "turbolite-s3" if mode == "s3" else "turbolite"
102139
return sqlite3.connect(f"file:{path}?vfs={vfs}", uri=True)

src/ext.rs

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,18 @@
33
//! Exports `turbolite_ext_register_vfs()` which is called from the C entry
44
//! point in `ext_entry.c` after `SQLITE_EXTENSION_INIT2` stores the API table.
55
//!
6-
//! The C shim provides symbol shims for `sqlite3_vfs_register` etc. that route
7-
//! through the extension API table, so `sqlite_vfs::register()` works correctly
8-
//! inside a loadable extension.
6+
//! ## VFS registration
97
//!
10-
//! ## VFS selection
8+
//! Always registers **"turbolite"** — local compressed VFS (zstd).
119
//!
12-
//! If `TURBOLITE_BUCKET` is set, registers a **tiered S3 VFS** (requires the
13-
//! `tiered` feature). Otherwise, registers a **local compressed VFS**.
10+
//! If `TURBOLITE_BUCKET` is set, also registers **"turbolite-s3"** — tiered
11+
//! S3 VFS. Fails hard if bucket is set but configuration is invalid.
1412
//!
1513
//! ### Environment variables (tiered mode)
1614
//!
1715
//! | Variable | Required | Default | Description |
1816
//! |---|---|---|---|
19-
//! | `TURBOLITE_BUCKET` | yes | — | S3 bucket name |
17+
//! | `TURBOLITE_BUCKET` | yes | — | S3 bucket name (triggers S3 VFS registration) |
2018
//! | `TURBOLITE_PREFIX` | no | `"turbolite"` | S3 key prefix |
2119
//! | `TURBOLITE_CACHE_DIR` | no | `"/tmp/turbolite"` | Local cache directory |
2220
//! | `TURBOLITE_ENDPOINT_URL` | no | — | Custom S3 endpoint (Tigris, MinIO) |
@@ -30,29 +28,38 @@
3028
3129
use std::sync::atomic::{AtomicBool, Ordering};
3230

33-
static VFS_REGISTERED: AtomicBool = AtomicBool::new(false);
31+
static LOCAL_VFS_REGISTERED: AtomicBool = AtomicBool::new(false);
32+
static TIERED_VFS_REGISTERED: AtomicBool = AtomicBool::new(false);
3433

3534
/// Called from C entry point (`sqlite3_turbolite_init` in ext_entry.c).
3635
/// Returns 0 on success, 1 on error. Idempotent: second call is a no-op.
36+
///
37+
/// Always registers "turbolite" (local compressed VFS).
38+
/// If TURBOLITE_BUCKET is set, also registers "turbolite-s3" (tiered VFS).
39+
/// Panics if TURBOLITE_BUCKET is set but tiered VFS creation fails.
3740
#[no_mangle]
3841
pub extern "C" fn turbolite_ext_register_vfs() -> std::os::raw::c_int {
39-
if VFS_REGISTERED.swap(true, Ordering::SeqCst) {
40-
return 0;
42+
// Register local VFS (always)
43+
if !LOCAL_VFS_REGISTERED.swap(true, Ordering::SeqCst) {
44+
if let Err(e) = register_local() {
45+
LOCAL_VFS_REGISTERED.store(false, Ordering::SeqCst);
46+
eprintln!("turbolite: failed to register local VFS: {e}");
47+
return 1;
48+
}
4149
}
4250

43-
let result = if std::env::var("TURBOLITE_BUCKET").is_ok() {
44-
register_tiered()
45-
} else {
46-
register_local()
47-
};
48-
49-
match result {
50-
Ok(()) => 0,
51-
Err(_) => {
52-
VFS_REGISTERED.store(false, Ordering::SeqCst);
53-
1
51+
// Register tiered VFS if TURBOLITE_BUCKET is set
52+
if std::env::var("TURBOLITE_BUCKET").is_ok()
53+
&& !TIERED_VFS_REGISTERED.swap(true, Ordering::SeqCst)
54+
{
55+
if let Err(e) = register_tiered() {
56+
TIERED_VFS_REGISTERED.store(false, Ordering::SeqCst);
57+
eprintln!("turbolite: TURBOLITE_BUCKET is set but tiered VFS failed: {e}");
58+
return 1;
5459
}
5560
}
61+
62+
0
5663
}
5764

5865
fn register_local() -> Result<(), std::io::Error> {
@@ -86,7 +93,7 @@ fn register_tiered() -> Result<(), std::io::Error> {
8693
let prefetch_threads = std::env::var("TURBOLITE_PREFETCH_THREADS")
8794
.ok()
8895
.and_then(|s| s.parse().ok())
89-
.unwrap_or(0); // 0 = use default (num_cpus + 1)
96+
.unwrap_or(0);
9097
let compression_level = std::env::var("TURBOLITE_COMPRESSION_LEVEL")
9198
.ok()
9299
.and_then(|s| s.parse().ok())
@@ -110,13 +117,14 @@ fn register_tiered() -> Result<(), std::io::Error> {
110117
}
111118

112119
let vfs = TieredVfs::new(config)?;
113-
crate::tiered::register("turbolite", vfs)
120+
crate::tiered::register("turbolite-s3", vfs)
114121
}
115122

116123
#[cfg(not(feature = "tiered"))]
117124
fn register_tiered() -> Result<(), std::io::Error> {
118125
Err(std::io::Error::new(
119126
std::io::ErrorKind::Unsupported,
120-
"TURBOLITE_BUCKET is set but this extension was built without the 'tiered' feature",
127+
"TURBOLITE_BUCKET is set but this extension was built without the 'tiered' feature. \
128+
Rebuild with: make ext (includes tiered by default)",
121129
))
122130
}

0 commit comments

Comments
 (0)