|
| 1 | +# TurboDB Roadmap — Engine-Native Cloud Features |
| 2 | + |
| 3 | +The key advantage: **we own the engine**. Neon had to hack Postgres apart to separate compute and storage. Supabase runs vanilla Postgres and can only orchestrate around it. TurboDB is built from scratch — the storage format, wire protocol, query engine, replication, and compression are all ours. |
| 4 | + |
| 5 | +Here's what that unlocks. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 1. Instant Branching (Git for Data) |
| 10 | + |
| 11 | +Copy-on-write forks at the storage layer. A branch is just a pointer + page diff — near-zero cost. |
| 12 | + |
| 13 | +``` |
| 14 | +Production DB (customer-a) |
| 15 | + | |
| 16 | + +-- branch: staging (CoW fork, near-zero cost) |
| 17 | + +-- branch: feature-x (CoW fork) |
| 18 | + +-- branch: debug-jan5 (point-in-time snapshot) |
| 19 | +``` |
| 20 | + |
| 21 | +Neon has this but it took them ages to bolt onto Postgres. TurboDB can build it natively — the mmap storage layer already knows what pages changed. Developers would kill for this. |
| 22 | + |
| 23 | +**Implementation**: Extend mmap.zig to support CoW page tables. A fork shares all existing pages; writes go to a private overlay. Merge = replay the overlay onto the parent. |
| 24 | + |
| 25 | +**Status**: Not started. Depends on: mmap page-level tracking. |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## 2. Built-in Edge Replication |
| 30 | + |
| 31 | +We control the replication protocol (Calvin). Instead of Postgres's heavy streaming replication: |
| 32 | + |
| 33 | +``` |
| 34 | +Primary (US-East) |
| 35 | + | lightweight sync |
| 36 | + +-- Read replica (EU) — 50ms away |
| 37 | + +-- Read replica (Asia) — 100ms away |
| 38 | + +-- Embedded replica (user's app, local) |
| 39 | +``` |
| 40 | + |
| 41 | +That last one is the Turso/libSQL play — an **embedded read replica** that ships inside the customer's app. Reads are local (0ms), writes go to primary. Because we own the wire format, we can make this tiny and efficient. |
| 42 | + |
| 43 | +**Implementation**: Calvin already batches + serializes transactions (419 bytes for 5 txns). Ship the batch stream to edge replicas. For embedded mode, compile TurboDB as a library (libturbodb.dylib — already exists) and add a sync client. |
| 44 | + |
| 45 | +**Status**: Calvin replication working (5/5 consistency verified in Docker cluster). Edge sync protocol not started. |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## 3. Query-Aware Scale-to-Zero |
| 50 | + |
| 51 | +Generic platforms can only watch "is there a TCP connection?" We go deeper: |
| 52 | + |
| 53 | +``` |
| 54 | +DEEP SLEEP: no connections for 30min -> snapshot to S3, free all RAM |
| 55 | +LIGHT SLEEP: connected but idle queries -> shrink buffer pool to 4MB |
| 56 | +WARM: active queries -> full resources |
| 57 | +HOT: heavy load -> auto-expand buffers, spawn read replicas |
| 58 | +``` |
| 59 | + |
| 60 | +We can instrument **inside the query engine** to know the difference between "connected but idle" and "actually working." No external platform can do this. |
| 61 | + |
| 62 | +**Implementation**: Add query activity tracking to wire.zig (last_query_time, queries_per_second). The cloud control plane reads these metrics and adjusts resource allocation. Deep sleep = flush WAL, snapshot pages to object storage, terminate process. Wake = restore from snapshot. |
| 63 | + |
| 64 | +**Status**: Wire server tracks connections. Query-level metrics not yet exposed. |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +## 4. Native Multi-Tenant Mode |
| 69 | + |
| 70 | +Instead of one-process-per-customer, TurboDB itself handles isolation: |
| 71 | + |
| 72 | +``` |
| 73 | +Single turbodb process |
| 74 | + +-- tenant: customer-a (isolated keyspace, own auth) |
| 75 | + +-- tenant: customer-b (isolated keyspace, own auth) |
| 76 | + +-- tenant: customer-c (isolated keyspace, own auth) |
| 77 | +``` |
| 78 | + |
| 79 | +One process, shared buffer pool, but **hard isolation at the engine level** — not row-level security bolted on top. Separate keyspaces, separate auth, separate resource quotas. This is what CockroachDB and Vitess do, but native from day one. |
| 80 | + |
| 81 | +Economics: |
| 82 | +- 1000 tenants in one process vs 1000 separate processes |
| 83 | +- Shared memory, shared CPU, isolated data |
| 84 | +- 10-100x better resource utilization |
| 85 | + |
| 86 | +**Implementation**: Extend Database to support namespaced collections (tenant_id/collection_name). Auth module already has per-key permissions — add tenant_id to KeyEntry. Add per-tenant resource quotas (max collections, max storage, max ops/sec). |
| 87 | + |
| 88 | +**Status**: Auth module exists (src/auth.zig). Multi-tenant namespacing not started. |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## 5. Time Travel Queries |
| 93 | + |
| 94 | +If TurboDB keeps versioned pages or a WAL, every query can look at any point in time: |
| 95 | + |
| 96 | +``` |
| 97 | +GET /db/orders/ord-001?as_of=2026-03-25T14:00:00Z |
| 98 | +``` |
| 99 | + |
| 100 | +Built into the engine, not a plugin. Every customer gets automatic point-in-time queries. Debugging production issues becomes trivial — "what did this row look like yesterday?" |
| 101 | + |
| 102 | +**Implementation**: MVCC version chains (src/mvcc.zig) already track multiple versions per document with epoch numbers. Map wall-clock timestamps to epochs. Add `as_of` parameter to GET/scan operations that reads the version chain at the specified epoch instead of HEAD. |
| 103 | + |
| 104 | +**Status**: MVCC working (41.8M read txn/s). Timestamp-to-epoch mapping not yet built. |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +## 6. Programmable Webhooks / Engine-Level CDC |
| 109 | + |
| 110 | +Not Postgres-style triggers (slow, run in-process). Native change data capture: |
| 111 | + |
| 112 | +``` |
| 113 | +ON INSERT INTO orders -> HTTP POST to customer's webhook |
| 114 | +ON CHANGE IN users WHERE role = 'admin' -> push event to stream |
| 115 | +``` |
| 116 | + |
| 117 | +Because we own the WAL, we can stream changes out natively — like a built-in CDC. Customers get real-time events without polling. This is what Supabase Realtime does, but they had to build a whole Elixir service to tail Postgres WAL. We'd have it **inside the engine**. |
| 118 | + |
| 119 | +**Implementation**: WAL already has an EntryIterator. Add a WAL tailer that runs in a background thread, filters entries against registered subscriptions, and fires HTTP webhooks or pushes to a WebSocket stream. Use the auth module's HMAC-SHA256 for webhook signatures. |
| 120 | + |
| 121 | +**Status**: WAL infrastructure exists (src/storage/wal.zig). Subscription API and webhook dispatch not started. HMAC-SHA256 available in src/crypto.zig. |
| 122 | + |
| 123 | +--- |
| 124 | + |
| 125 | +## 7. Per-Query Cost Metering |
| 126 | + |
| 127 | +We own the query executor, so we can track everything: |
| 128 | + |
| 129 | +```json |
| 130 | +{ |
| 131 | + "tenant": "customer-a", |
| 132 | + "query": "scan users limit=100", |
| 133 | + "rows_scanned": 14000, |
| 134 | + "bytes_read": 2100000, |
| 135 | + "cpu_us": 12, |
| 136 | + "cost_usd": 0.000003 |
| 137 | +} |
| 138 | +``` |
| 139 | + |
| 140 | +True usage-based pricing at the query level. Not "you used X GB of storage and Y hours of compute" — actual **per-query billing** like BigQuery. No one in the embedded/OLTP database space does this well. |
| 141 | + |
| 142 | +**Implementation**: Wrap collection operations with instrumentation counters (rows scanned, bytes read, time elapsed). Emit per-query metrics to a billing log. Cloud control plane aggregates and bills via Stripe metering API. |
| 143 | + |
| 144 | +**Status**: Basic request metrics exist in server.zig (req_count, err_count). Per-query instrumentation not started. |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +## 8. Snapshot Sharing / Dataset Marketplace |
| 149 | + |
| 150 | +Since we control the snapshot format: |
| 151 | + |
| 152 | +``` |
| 153 | +Customer publishes: "US Census 2025 dataset" |
| 154 | + -> stored as TurboDB snapshot on S3 |
| 155 | + -> another customer: "fork this dataset" |
| 156 | + -> instant CoW clone into their account |
| 157 | +``` |
| 158 | + |
| 159 | +A dataset marketplace where provisioning is instant because it's just forking a snapshot. Zero data copying. |
| 160 | + |
| 161 | +**Implementation**: Depends on CoW branching (feature 1). A "published snapshot" is a read-only CoW base. Forking = creating a new overlay on top of it. Storage backend needs S3/R2 support for snapshot persistence. |
| 162 | + |
| 163 | +**Status**: Not started. Depends on: CoW branching, object storage backend. |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## Priority Matrix |
| 168 | + |
| 169 | +| Feature | Impact | Effort | Dependencies | |
| 170 | +|---------|--------|--------|-------------| |
| 171 | +| **Multi-tenancy** | Critical for cloud economics | Medium | Auth module (done) | |
| 172 | +| **Time travel queries** | High differentiator | Easy | MVCC (done) | |
| 173 | +| **CDC / webhooks** | Table stakes for cloud DB | Medium | WAL (done), crypto (done) | |
| 174 | +| **Per-query metering** | Required for billing | Easy | Server metrics (partial) | |
| 175 | +| **Scale-to-zero** | Cost efficiency | Medium | Query metrics | |
| 176 | +| **CoW branching** | Killer feature | Hard | mmap page tracking | |
| 177 | +| **Edge replication** | Competitive moat | Hard | Calvin (done) | |
| 178 | +| **Snapshot marketplace** | Long-term play | Hard | CoW branching, S3 | |
| 179 | + |
| 180 | +**Recommended order**: Multi-tenancy → Time travel → CDC/webhooks → Per-query metering → Scale-to-zero → CoW branching → Edge replication → Marketplace |
| 181 | + |
| 182 | +--- |
| 183 | + |
| 184 | +## The Moat |
| 185 | + |
| 186 | +``` |
| 187 | ++-- TurboDB Cloud -----------------------------------------------+ |
| 188 | +| | |
| 189 | +| dashboard (control plane) | |
| 190 | +| +-- Provision instances | |
| 191 | +| +-- Branch / fork / time-travel | |
| 192 | +| +-- Per-query billing dashboard | |
| 193 | +| +-- Dataset marketplace | |
| 194 | +| | |
| 195 | +| turbodb (the engine WE own) | |
| 196 | +| +-- Native multi-tenancy | |
| 197 | +| +-- Built-in CDC / webhooks | |
| 198 | +| +-- Embedded read replicas | |
| 199 | +| +-- Query-aware sleep states | |
| 200 | +| +-- CoW branching at storage layer | |
| 201 | +| | |
| 202 | +| Runs on: one Hetzner box to start | |
| 203 | +| Scales to: cluster of boxes with Calvin + placement layer | |
| 204 | ++-----------------------------------------------------------------+ |
| 205 | +``` |
| 206 | + |
| 207 | +The moat is **vertical integration**. Supabase can never do half of this because they don't own Postgres. Neon can do some of it but they're constrained by Postgres's architecture. PlanetScale owns Vitess but it's MySQL-flavored and complex. |
| 208 | + |
| 209 | +TurboDB is a clean-slate database built from scratch. That's rare. That's the advantage. |
0 commit comments