Skip to content

Commit fd62647

Browse files
authored
Merge pull request #40 from justrach/issue-32-33-39-cow-replication-marketplace
Implement multi-tenancy, time travel, CDC, branching, replication, and marketplace MVPs
2 parents 4d7a9ab + 7cd3a35 commit fd62647

34 files changed

Lines changed: 3952 additions & 148 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,6 @@ zig-out/
22
.zig-cache/
33
python/dist/
44
*.egg-info/
5+
codedb.snapshot
6+
__pycache__/
7+
*.pyc

FINDINGS.md

Lines changed: 101 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,9 +225,100 @@ All crypto functions are available via:
225225
- **C ABI**: `turbodb_sha256(data, len, out)` (10 exported symbols in libturbodb)
226226
- **Python**: `from turbodb import crypto; crypto.sha256_hex(b"data")`
227227

228+
228229
---
229230

230-
## 6. Reproducing These Results
231+
## 6. Calvin Replication — Cluster Test Results
232+
233+
### What is Calvin?
234+
235+
Calvin is a deterministic replication protocol (Yale, 2012) that eliminates Two-Phase Commit (2PC). Instead of letting each node negotiate transaction ordering:
236+
237+
1. A **sequencer** (leader) assigns a global total order to all transactions
238+
2. The ordered batch is **broadcast** to all replicas
239+
3. Every node **executes deterministically** in the same order
240+
4. All nodes converge to **identical state** — no voting, no 2PC, no distributed locks
241+
242+
### Test: In-Process E2E (2 databases, 1 process)
243+
244+
```
245+
zig build test-calvin
246+
```
247+
248+
| Step | What happened |
249+
|------|---------------|
250+
| Open 2 databases | Separate mmap + WAL at `/tmp/calvin_test_leader` and `/tmp/calvin_test_replica` |
251+
| Submit 5 txns | 3 users + 2 orders → sequencer |
252+
| Drain batch | epoch=0, seq_start=0, 5 transactions |
253+
| Serialize | 419 bytes (simulated network payload) |
254+
| Leader executes | Applied 5 inserts to leader DB |
255+
| Replica deserializes + executes | Same 5 inserts applied to replica DB |
256+
| **Verify consistency** | **5/5 documents byte-identical** |
257+
258+
```
259+
PASS users/alice → leader={"name":"Alice","age":30} replica={"name":"Alice","age":30}
260+
PASS users/bob → leader={"name":"Bob","age":25} replica={"name":"Bob","age":25}
261+
PASS users/charlie → leader={"name":"Charlie","age":35} replica={"name":"Charlie","age":35}
262+
PASS orders/ord-001 → leader={"user":"alice","total":99.99} replica={"user":"alice","total":99.99}
263+
PASS orders/ord-002 → leader={"user":"bob","total":42.50} replica={"user":"bob","total":42.50}
264+
```
265+
266+
### Test: 3-Node Docker Cluster (Colima)
267+
268+
```bash
269+
bash bench/test_calvin_cluster.sh
270+
```
271+
272+
| Component | Details |
273+
|-----------|---------|
274+
| **node-0** | Leader/sequencer, port 27017, Calvin node_id=0 |
275+
| **node-1** | Replica, port 27018, Calvin node_id=1 |
276+
| **node-2** | Replica, port 27019, Calvin node_id=2 |
277+
| **Image** | `turbodb-node:latest` — 3.1MB static binary on Debian slim |
278+
| **Binary** | Cross-compiled: `zig build -Dtarget=aarch64-linux -Doptimize=ReleaseFast` |
279+
280+
Results:
281+
- All 3 nodes healthy (wire protocol accepting connections)
282+
- Calvin replication active on all nodes (leader=true/false logged correctly)
283+
- In-container E2E test: **5/5 PASS, CONSISTENT**
284+
- Wire protocol reachable from host on all 3 ports
285+
286+
### Calvin vs 2PC — Why this matters
287+
288+
| | Calvin (TurboDB) | 2PC (Traditional) |
289+
|---|---|---|
290+
| Network round-trips | 1 broadcast | 2 (prepare + commit) |
291+
| Locks between nodes | None | Held during entire 2PC |
292+
| Coordinator failure | Sequencer handoff | All nodes stuck waiting |
293+
| Throughput | Batch-amortized | Per-transaction overhead |
294+
| Latency | Batch window (5ms default) | 2 RTTs minimum |
295+
| Complexity | Sequencer + deterministic exec | Coordinator + participant + recovery log |
296+
297+
### Running the cluster yourself
298+
299+
```bash
300+
# 1. Cross-compile for Linux
301+
zig build -Doptimize=ReleaseFast -Dtarget=aarch64-linux
302+
303+
# 2. Build Docker image
304+
docker build -f bench/docker/turbodb-cluster.Dockerfile -t turbodb-node .
305+
306+
# 3. Start 3-node cluster
307+
docker compose -f bench/docker/calvin-cluster.yml up -d
308+
309+
# 4. Run E2E test
310+
docker compose -f bench/docker/calvin-cluster.yml run --rm tester
311+
312+
# 5. Tear down
313+
docker compose -f bench/docker/calvin-cluster.yml down -v
314+
315+
# Or all-in-one:
316+
bash bench/test_calvin_cluster.sh
317+
```
318+
319+
---
320+
321+
## 7. Reproducing These Results
231322

232323
```bash
233324
# Full regression benchmark (21 subsystems)
@@ -244,6 +335,15 @@ python3 bench/triple_bench.py --turbodb-port 27030
244335

245336
# Just TurboDB vs MongoDB
246337
python3 bench/bench.py
338+
339+
# Calvin replication E2E test (in-process, 2 databases)
340+
zig build test-calvin
341+
342+
# Calvin 3-node Docker cluster test
343+
bash bench/test_calvin_cluster.sh
344+
345+
# All unit tests (27 subsystems, ~200 tests)
346+
zig build test-all
247347
```
248348

249349
### Environment

ROADMAP.md

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# TurboDB Roadmap — Engine-Native Cloud Features
2+
3+
The key advantage: **we own the engine**. Neon had to hack Postgres apart to separate compute and storage. Supabase runs vanilla Postgres and can only orchestrate around it. TurboDB is built from scratch — the storage format, wire protocol, query engine, replication, and compression are all ours.
4+
5+
Here's what that unlocks.
6+
7+
---
8+
9+
## 1. Instant Branching (Git for Data)
10+
11+
Copy-on-write forks at the storage layer. A branch is just a pointer + page diff — near-zero cost.
12+
13+
```
14+
Production DB (customer-a)
15+
|
16+
+-- branch: staging (CoW fork, near-zero cost)
17+
+-- branch: feature-x (CoW fork)
18+
+-- branch: debug-jan5 (point-in-time snapshot)
19+
```
20+
21+
Neon has this but it took them ages to bolt onto Postgres. TurboDB can build it natively — the mmap storage layer already knows what pages changed. Developers would kill for this.
22+
23+
**Implementation**: Extend mmap.zig to support CoW page tables. A fork shares all existing pages; writes go to a private overlay. Merge = replay the overlay onto the parent.
24+
25+
**Status**: Not started. Depends on: mmap page-level tracking.
26+
27+
---
28+
29+
## 2. Built-in Edge Replication
30+
31+
We control the replication protocol (Calvin). Instead of Postgres's heavy streaming replication:
32+
33+
```
34+
Primary (US-East)
35+
| lightweight sync
36+
+-- Read replica (EU) — 50ms away
37+
+-- Read replica (Asia) — 100ms away
38+
+-- Embedded replica (user's app, local)
39+
```
40+
41+
That last one is the Turso/libSQL play — an **embedded read replica** that ships inside the customer's app. Reads are local (0ms), writes go to primary. Because we own the wire format, we can make this tiny and efficient.
42+
43+
**Implementation**: Calvin already batches + serializes transactions (419 bytes for 5 txns). Ship the batch stream to edge replicas. For embedded mode, compile TurboDB as a library (libturbodb.dylib — already exists) and add a sync client.
44+
45+
**Status**: Calvin replication working (5/5 consistency verified in Docker cluster). Edge sync protocol not started.
46+
47+
---
48+
49+
## 3. Query-Aware Scale-to-Zero
50+
51+
Generic platforms can only watch "is there a TCP connection?" We go deeper:
52+
53+
```
54+
DEEP SLEEP: no connections for 30min -> snapshot to S3, free all RAM
55+
LIGHT SLEEP: connected but idle queries -> shrink buffer pool to 4MB
56+
WARM: active queries -> full resources
57+
HOT: heavy load -> auto-expand buffers, spawn read replicas
58+
```
59+
60+
We can instrument **inside the query engine** to know the difference between "connected but idle" and "actually working." No external platform can do this.
61+
62+
**Implementation**: Add query activity tracking to wire.zig (last_query_time, queries_per_second). The cloud control plane reads these metrics and adjusts resource allocation. Deep sleep = flush WAL, snapshot pages to object storage, terminate process. Wake = restore from snapshot.
63+
64+
**Status**: Wire server tracks connections. Query-level metrics not yet exposed.
65+
66+
---
67+
68+
## 4. Native Multi-Tenant Mode
69+
70+
Instead of one-process-per-customer, TurboDB itself handles isolation:
71+
72+
```
73+
Single turbodb process
74+
+-- tenant: customer-a (isolated keyspace, own auth)
75+
+-- tenant: customer-b (isolated keyspace, own auth)
76+
+-- tenant: customer-c (isolated keyspace, own auth)
77+
```
78+
79+
One process, shared buffer pool, but **hard isolation at the engine level** — not row-level security bolted on top. Separate keyspaces, separate auth, separate resource quotas. This is what CockroachDB and Vitess do, but native from day one.
80+
81+
Economics:
82+
- 1000 tenants in one process vs 1000 separate processes
83+
- Shared memory, shared CPU, isolated data
84+
- 10-100x better resource utilization
85+
86+
**Implementation**: Extend Database to support namespaced collections (tenant_id/collection_name). Auth module already has per-key permissions — add tenant_id to KeyEntry. Add per-tenant resource quotas (max collections, max storage, max ops/sec).
87+
88+
**Status**: Auth module exists (src/auth.zig). Multi-tenant namespacing not started.
89+
90+
---
91+
92+
## 5. Time Travel Queries
93+
94+
If TurboDB keeps versioned pages or a WAL, every query can look at any point in time:
95+
96+
```
97+
GET /db/orders/ord-001?as_of=2026-03-25T14:00:00Z
98+
```
99+
100+
Built into the engine, not a plugin. Every customer gets automatic point-in-time queries. Debugging production issues becomes trivial — "what did this row look like yesterday?"
101+
102+
**Implementation**: MVCC version chains (src/mvcc.zig) already track multiple versions per document with epoch numbers. Map wall-clock timestamps to epochs. Add `as_of` parameter to GET/scan operations that reads the version chain at the specified epoch instead of HEAD.
103+
104+
**Status**: MVCC working (41.8M read txn/s). Timestamp-to-epoch mapping not yet built.
105+
106+
---
107+
108+
## 6. Programmable Webhooks / Engine-Level CDC
109+
110+
Not Postgres-style triggers (slow, run in-process). Native change data capture:
111+
112+
```
113+
ON INSERT INTO orders -> HTTP POST to customer's webhook
114+
ON CHANGE IN users WHERE role = 'admin' -> push event to stream
115+
```
116+
117+
Because we own the WAL, we can stream changes out natively — like a built-in CDC. Customers get real-time events without polling. This is what Supabase Realtime does, but they had to build a whole Elixir service to tail Postgres WAL. We'd have it **inside the engine**.
118+
119+
**Implementation**: WAL already has an EntryIterator. Add a WAL tailer that runs in a background thread, filters entries against registered subscriptions, and fires HTTP webhooks or pushes to a WebSocket stream. Use the auth module's HMAC-SHA256 for webhook signatures.
120+
121+
**Status**: WAL infrastructure exists (src/storage/wal.zig). Subscription API and webhook dispatch not started. HMAC-SHA256 available in src/crypto.zig.
122+
123+
---
124+
125+
## 7. Per-Query Cost Metering
126+
127+
We own the query executor, so we can track everything:
128+
129+
```json
130+
{
131+
"tenant": "customer-a",
132+
"query": "scan users limit=100",
133+
"rows_scanned": 14000,
134+
"bytes_read": 2100000,
135+
"cpu_us": 12,
136+
"cost_usd": 0.000003
137+
}
138+
```
139+
140+
True usage-based pricing at the query level. Not "you used X GB of storage and Y hours of compute" — actual **per-query billing** like BigQuery. No one in the embedded/OLTP database space does this well.
141+
142+
**Implementation**: Wrap collection operations with instrumentation counters (rows scanned, bytes read, time elapsed). Emit per-query metrics to a billing log. Cloud control plane aggregates and bills via Stripe metering API.
143+
144+
**Status**: Basic request metrics exist in server.zig (req_count, err_count). Per-query instrumentation not started.
145+
146+
---
147+
148+
## 8. Snapshot Sharing / Dataset Marketplace
149+
150+
Since we control the snapshot format:
151+
152+
```
153+
Customer publishes: "US Census 2025 dataset"
154+
-> stored as TurboDB snapshot on S3
155+
-> another customer: "fork this dataset"
156+
-> instant CoW clone into their account
157+
```
158+
159+
A dataset marketplace where provisioning is instant because it's just forking a snapshot. Zero data copying.
160+
161+
**Implementation**: Depends on CoW branching (feature 1). A "published snapshot" is a read-only CoW base. Forking = creating a new overlay on top of it. Storage backend needs S3/R2 support for snapshot persistence.
162+
163+
**Status**: Not started. Depends on: CoW branching, object storage backend.
164+
165+
---
166+
167+
## Priority Matrix
168+
169+
| Feature | Impact | Effort | Dependencies |
170+
|---------|--------|--------|-------------|
171+
| **Multi-tenancy** | Critical for cloud economics | Medium | Auth module (done) |
172+
| **Time travel queries** | High differentiator | Easy | MVCC (done) |
173+
| **CDC / webhooks** | Table stakes for cloud DB | Medium | WAL (done), crypto (done) |
174+
| **Per-query metering** | Required for billing | Easy | Server metrics (partial) |
175+
| **Scale-to-zero** | Cost efficiency | Medium | Query metrics |
176+
| **CoW branching** | Killer feature | Hard | mmap page tracking |
177+
| **Edge replication** | Competitive moat | Hard | Calvin (done) |
178+
| **Snapshot marketplace** | Long-term play | Hard | CoW branching, S3 |
179+
180+
**Recommended order**: Multi-tenancy → Time travel → CDC/webhooks → Per-query metering → Scale-to-zero → CoW branching → Edge replication → Marketplace
181+
182+
---
183+
184+
## The Moat
185+
186+
```
187+
+-- TurboDB Cloud -----------------------------------------------+
188+
| |
189+
| dashboard (control plane) |
190+
| +-- Provision instances |
191+
| +-- Branch / fork / time-travel |
192+
| +-- Per-query billing dashboard |
193+
| +-- Dataset marketplace |
194+
| |
195+
| turbodb (the engine WE own) |
196+
| +-- Native multi-tenancy |
197+
| +-- Built-in CDC / webhooks |
198+
| +-- Embedded read replicas |
199+
| +-- Query-aware sleep states |
200+
| +-- CoW branching at storage layer |
201+
| |
202+
| Runs on: one Hetzner box to start |
203+
| Scales to: cluster of boxes with Calvin + placement layer |
204+
+-----------------------------------------------------------------+
205+
```
206+
207+
The moat is **vertical integration**. Supabase can never do half of this because they don't own Postgres. Neon can do some of it but they're constrained by Postgres's architecture. PlanetScale owns Vitess but it's MySQL-flavored and complex.
208+
209+
TurboDB is a clean-slate database built from scratch. That's rare. That's the advantage.

0 commit comments

Comments
 (0)