Skip to content

Storage Model

AEndrix edited this page May 14, 2026 · 3 revisions

Storage Model

graft stores everything in one SQLite file per profile: ~/.graft/profiles/<name>/graft.db. WAL mode. Optionally encrypted with SQLCipher (set GRAFT_DB_KEY).

Pragmas

PRAGMA journal_mode  = WAL;       -- readers don't block writers
PRAGMA synchronous   = NORMAL;    -- safe + fast on WAL
PRAGMA foreign_keys  = ON;        -- cascades from node deletes
PRAGMA mmap_size     = 268435456; -- 256 MiB mmap window

Tables

nodes

┌──────────────┬─────────────────┬─────────────────────────────────────────────┐
│    Column    │      Type       │                    Notes                    │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ id           │ BLOB(16) PK     │ UUIDv7 — sortable by time, dense.           │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ content_hash │ BLOB(32) UNIQUE │ BLAKE3(title || body || sorted keywords).   │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ title        │ TEXT NOT NULL   │ Short headline.                             │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ body         │ TEXT NOT NULL   │ Full content.                               │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ author       │ TEXT NULL       │ Provenance.                                 │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ created_at   │ INT64           │ Unix epoch seconds.                         │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ expires_at   │ INT64           │ 0 = never.                                  │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ last_access  │ INT64           │ Updated on STRONG hit.                      │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ access_count │ INT64           │ Bumped on STRONG hit.                       │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ state        │ INT0=ACTIVE, 1=STALE, 2=SUPERSEDED.            │
├──────────────┼─────────────────┼─────────────────────────────────────────────┤
│ origin       │ INT0=LOCAL, 1=REMOTE, 2=PUSHED (profile sync). │
└──────────────┴─────────────────┴─────────────────────────────────────────────┘

Indexes: idx_nodes_hash (content dedup), idx_nodes_state (filter SUPERSEDED out of retrieval), idx_nodes_expires_at (consolidate), idx_nodes_origin (sync).

node_fts (VIRTUAL, FTS5)

Mirror of (title, body). Triggers on nodes keep it in sync. Tokenizer: unicode61 remove_diacritics 2. Used for BM25 ranking.

keywords

┌──────────────┬────────────────────────────┐
│    Column    │            Type            │
├──────────────┼────────────────────────────┤
│ id           │ INT PK                     │
├──────────────┼────────────────────────────┤
│ textTEXT UNIQUE COLLATE NOCASE │
├──────────────┼────────────────────────────┤
│ canonical_id │ INT NULL FK (keywords.id)  │
├──────────────┼────────────────────────────┤
│ embedding    │ BLOB NULL                  │
└──────────────┴────────────────────────────┘

canonical_id and embedding are reserved for future dedup / synonym work. Unused today.

node_keywords

Linking table. PK (node_id, keyword_id). Cascades on node delete.

edges

┌────────────┬─────────────┬─────────────────────────────────────────────────────┐
│   Column   │    Type     │                        Notes                        │
├────────────┼─────────────┼─────────────────────────────────────────────────────┤
│ src        │ BLOB(16) FK │ Source node.                                        │
├────────────┼─────────────┼─────────────────────────────────────────────────────┤
│ dst        │ BLOB(16) FK │ Destination node.                                   │
├────────────┼─────────────┼─────────────────────────────────────────────────────┤
│ kind       │ INT0=KEYWORD, 1=SEMANTIC, 2=CONTRADICTS, 3=SUPERSEDES. │
├────────────┼─────────────┼─────────────────────────────────────────────────────┤
│ keyword_id │ INT NULL FK │ Set only for KEYWORD kind.                          │
├────────────┼─────────────┼─────────────────────────────────────────────────────┤
│ weight     │ REAL        │ Typically [0, 1], normalized.                       │
└────────────┴─────────────┴─────────────────────────────────────────────────────┘

Unique: (src, dst, kind, COALESCE(keyword_id, -1)). Indexes on src and dst.

similarity_samples

(ts INT64, kind INT, cosine REAL). Kind 0 = insert_topk, kind 1 = query_top1. Powers graft stats percentile output.

node_vec (sqlite-vec VIRTUAL TABLE)

CREATE VIRTUAL TABLE node_vec USING vec0(
  id BLOB PRIMARY KEY,
  embedding FLOAT[1024]
);

Cosine top-k via vec_distance_cosine(). Falls back to a regular (id BLOB, embedding BLOB) table if sqlite-vec isn't loaded — same correctness, just slower top-k.

Migrations

Tracked via PRAGMA user_version. Each migration is idempotent.

- v2: summary→title, detail→body; added author, expires_at; rebuilt FTS5 + triggers.
- v3: added origin column to nodes + index. Pure ALTER TABLE ADD COLUMN — online, no table rebuild.

A newer daemon upgrading an older DB applies migrations in sequence before accepting connections.

Idempotency

Inserts hash (title || '\0' || body || '\0' || sorted_keywords) with BLAKE3, store in content_hash UNIQUE. A duplicate insert returns the existing id with duplicate: true and pays zero embedding cost (the
check happens before encoding).

Supersession

insert --supersedes <old_id> atomically:
1. Inserts the new node.
2. Sets old.state = SUPERSEDED.
3. Creates a SUPERSEDES edge from new → old.

Retrieval filters state != ACTIVE by default. The viewer shows superseded nodes in muted gray with the SUPERSEDES edge in red, so the history stays visible.

Encryption at rest

Build with SQLCipher and set GRAFT_DB_KEY env var. The daemon applies PRAGMA key = '<key>' and PRAGMA cipher_compatibility = 4 at open. Without SQLCipher in the build, the env var is a no-op (logged at
startup).

Backup

Two options:

1. Live backup: sqlite3 graft.db ".backup '/path/to/backup.db'" — online, safe with WAL.
2. Cold copy: stop the daemon (pkill graftd), copy graft.db + graft.db-wal + graft.db-shm together, restart.

Clone this wiki locally