Skip to content

Operations

AEndrix edited this page May 15, 2026 · 3 revisions

All operations are dispatched through mg_dispatch() from a MessagePack frame. There are 11 core ops. Each has identical signature:

mg_err_t mg_op_<name>(mg_ctx_t *ctx, mpack_node_t args, mpack_writer_t *result);
Op Source Surface
insert src/insert/insert.c CLI, REST POST /v1/insert, MCP graft_insert
query src/retrieve/query.c CLI, REST GET /v1/match, MCP graft_query
retrieve src/retrieve/retrieve.c CLI, REST GET /v1/search, MCP graft_retrieve
explore src/explore/explore.c CLI, REST GET /v1/explore, MCP graft_explore
classify src/insert/classify.c CLI, REST GET /v1/classify, MCP graft_classify
get src/retrieve/get.c CLI, REST GET /v1/nodes/{id}, MCP graft_get
delete src/retrieve/delete.c CLI, REST DELETE /v1/nodes/{id} (gated), MCP graft_delete
stats src/stats/stats.c CLI, MCP graft_stats
consolidate src/stats/consolidate.c CLI
view src/retrieve/view.c REST GET /v1/view
remote_sync src/storage/remote_sync_op.c CLI profile remote sync

insert — pipeline

text args → embed(title) → normalize(keywords) →
  topk_by_keyword(KEYWORD edges) →
  topk(SEMANTIC edges) + MMR diversity →
  ONE atomic transaction:
    INSERT INTO nodes
    INSERT INTO node_vec
    INSERT INTO keywords / node_keywords
    INSERT INTO edges

Idempotent: a duplicate content_hash returns the existing id with duplicate: true, paying zero embedding cost.

Supersession: --supersedes <id> atomically inserts the new node, marks the old as state=SUPERSEDED (2), creates a SUPERSEDES edge. Old node is filtered from retrieval but preserved in history.

query — verified cache lookup

text → embed → vector_topk(10) →
  for each candidate: trigram_jaccard(title) + cosine(embedding)
                      [+ cross_encoder if enabled]
                      [+ NLI if enabled] →
  gate: STRONG / WEAK / MISS

Returns the top-1 if it passes the STRONG gate; otherwise WEAK with a banner; otherwise MISS with an optional fallback list of top-N closest unverified.

Two-path gate (default):

  • Lex path: high lexical Jaccard + cosine over lex_strong_min_vec → STRONG.
  • Sem path: high cosine, sem margin over lex → STRONG.

Fused gate (opt-in, use_fused_gate: true): single fused score, thresholds strong_min_fused / weak_min_fused.

retrieve — hybrid top-k

text → embed →
  list_1 = vector_topk(N)         # cosine
  list_2 = bm25_title_topk(N)     # FTS5
  list_3 = bm25_body_topk(N)      # FTS5
  fused = RRF(list_1, list_2, list_3, k=rrf_k_const)
  [optional second-stage rerank if rerank.enabled]
  return fused[:top_k]

RRF (Reciprocal Rank Fusion, Cormack et al.): score(d) = Σ_i 1/(k + rank_i(d)). Default k=60.

explore — beam search

seed = vector_topk(beam)
[filter by --keywords if provided]
for hop in 1..depth:
  candidates = neighbors(seed, KEYWORD|SEMANTIC edges)
  score(c) = log(edge_w) + α·log(cosine) − decay(hop, γ)
  next_seed = topk(candidates, beam) with MMR diversity
  return all_visited

α weights cosine vs structural edge weight. γ decays score by hop. mmr_lambda (shared with insert) penalizes redundant beams.

classify — keyword suggestion

text → embed → vector_topk(50) →
  for each neighbor: walk KEYWORD edges →
  count keyword occurrences →
  return top-6

Pure graph-based; no LLM call. Fast.

consolidate — safe maintenance

1. DELETE nodes WHERE expires_at < now AND expires_at > 0
2. DELETE edges WHERE src NOT IN nodes OR dst NOT IN nodes
3. DELETE keywords WHERE NOT EXISTS (SELECT 1 FROM node_keywords ...)
4. ANALYZE
5. RETURN { pruned_nodes, removed_edges, removed_keywords }

Safe to run as a cron. Doesn't touch live data.

remote_sync — profile replication

1. open remote SQLite file
2. apply schema migrations (v2→v3 if needed)
3. PULL: upsert remote nodes into local (origin=REMOTE, key=content_hash)
4. detect remote deletions: nodes where origin=REMOTE but absent in remote → delete
5. PUSH: copy local-only nodes (origin=LOCAL) into remote, mark origin=PUSHED
6. RETURN { pulled, deleted, pushed }

Refuses if daemon is running on the target profile (no live mutation).

Conflict policy: insert wins client (content_hash UNIQUE skips dups), delete wins remote (gated by origin != LOCAL so the user's not-yet-pushed inserts are immune until they're pushed).

Clone this wiki locally