Skip to content

Storage Model

AEndrix edited this page May 15, 2026 · 3 revisions

graft stores everything in one SQLite file per profile: ~/.graft/profiles/<name>/graft.db. WAL mode. Optionally encrypted with SQLCipher (set GRAFT_DB_KEY).

Pragmas

PRAGMA journal_mode  = WAL;       -- readers don't block writers
PRAGMA synchronous   = NORMAL;    -- safe + fast on WAL
PRAGMA foreign_keys  = ON;        -- cascades from node deletes
PRAGMA mmap_size     = 268435456; -- 256 MiB mmap window

Tables

nodes

Column Type Notes
id BLOB(16) PK UUIDv7 — sortable by time, dense.
content_hash BLOB(32) UNIQUE BLAKE3(title || body || sorted keywords).
title TEXT NOT NULL Short headline.
body TEXT NOT NULL Full content.
author TEXT NULL Provenance.
created_at INT64 Unix epoch seconds.
expires_at INT64 0 = never.
last_access INT64 Updated on STRONG hit.
access_count INT64 Bumped on STRONG hit.
state INT 0=ACTIVE, 1=STALE, 2=SUPERSEDED.
origin INT 0=LOCAL, 1=REMOTE, 2=PUSHED (profile sync).

Indexes: idx_nodes_hash (content dedup), idx_nodes_state (filter SUPERSEDED out of retrieval), idx_nodes_expires_at (consolidate), idx_nodes_origin (sync).

node_fts (VIRTUAL, FTS5)

Mirror of (title, body). Triggers on nodes keep it in sync. Tokenizer: unicode61 remove_diacritics 2. Used for BM25 ranking.

keywords

Column Type
id INT PK
text TEXT UNIQUE COLLATE NOCASE
canonical_id INT NULL FK (keywords.id)
embedding BLOB NULL

canonical_id and embedding are reserved for future dedup / synonym work. Unused today.

node_keywords

Linking table. PK (node_id, keyword_id). Cascades on node delete.

edges

Column Type Notes
src BLOB(16) FK Source node.
dst BLOB(16) FK Destination node.
kind INT 0=KEYWORD, 1=SEMANTIC, 2=CONTRADICTS, 3=SUPERSEDES.
keyword_id INT NULL FK Set only for KEYWORD kind.
weight REAL Typically [0, 1], normalized.

Unique: (src, dst, kind, COALESCE(keyword_id, -1)). Indexes on src and dst.

similarity_samples

(ts INT64, kind INT, cosine REAL). Kind 0 = insert_topk, kind 1 = query_top1. Powers graft stats percentile output.

node_vec (sqlite-vec VIRTUAL TABLE)

CREATE VIRTUAL TABLE node_vec USING vec0(
  id BLOB PRIMARY KEY,
  embedding FLOAT[1024]
);

Cosine top-k via vec_distance_cosine(). Falls back to a regular (id BLOB, embedding BLOB) table if sqlite-vec isn't loaded — same correctness, just slower top-k.

Migrations

Tracked via PRAGMA user_version. Each migration is idempotent.

  • v2: summary→title, detail→body; added author, expires_at; rebuilt FTS5 + triggers.
  • v3: added origin column to nodes + index. Pure ALTER TABLE ADD COLUMN — online, no table rebuild.

A newer daemon upgrading an older DB applies migrations in sequence before accepting connections.

Idempotency

Inserts hash (title || '\0' || body || '\0' || sorted_keywords) with BLAKE3, store in content_hash UNIQUE. A duplicate insert returns the existing id with duplicate: true and pays zero embedding cost (the check happens before encoding).

Supersession

insert --supersedes <old_id> atomically:

  1. Inserts the new node.
  2. Sets old.state = SUPERSEDED.
  3. Creates a SUPERSEDES edge from new → old.

Retrieval filters state != ACTIVE by default. The viewer shows superseded nodes in muted gray with the SUPERSEDES edge in red, so the history stays visible.

Encryption at rest

Build with SQLCipher and set GRAFT_DB_KEY env var. The daemon applies PRAGMA key = '<key>' and PRAGMA cipher_compatibility = 4 at open. Without SQLCipher in the build, the env var is a no-op (logged at startup).

Backup

Two options:

  1. Live backup: sqlite3 graft.db ".backup '/path/to/backup.db'" — online, safe with WAL.
  2. Cold copy: stop the daemon (pkill graftd), copy graft.db + graft.db-wal + graft.db-shm together, restart.

Clone this wiki locally