GhostCrab MCP + mindBrain SQLite — structured domain navigation for llama_index sessions #21745

FrancoisLamotte · 2026-05-21T14:27:51Z

FrancoisLamotte
May 21, 2026

What problem this solves

LlamaIndex is strong at indexing, retrieval, and workflow composition.
Its memory architecture, however, is structurally siloed by design.

Three isolated layers coexist without ever aligning: short-term ChatMemoryBuffer
(FIFO, evicted under token caps), long-term MemoryBlock types
(FactExtractionMemoryBlock, VectorMemoryBlock, StaticMemoryBlock),
and document StorageContext. Each FunctionAgent and AgentWorkflow instantiates
its own Memory, defaulting to SQLite in-memory. AgentWorkflow shares runtime
Context — but no shared semantic memory across agents.

A ResearchAgent and a WriterAgent in the same workflow operate on disjoint memory islands.
Findings discovered in one pipeline run are invisible to the next.
Context rebuilt from scratch every time.

This request adds two complementary components that solve this without touching LlamaIndex core.

What a domain looks like in practice

A domain is any bounded context where agents need to
reason over structured relationships : not just retrieve text.

mindBrain separates two levels:

Ontology (the model) : the schema of a domain:
entity types, relationship types, constraints, vocabularies.
Defined once, shared across every agent and every workflow run.

Knowledge Graph (the projected instance) : the populated graph:
real entities, real edges, real state : queryable via projections.

Example: multi-agent project delivery

Ontology (model)              Knowledge Graph (instance)
─────────────────             ──────────────────────────
Project                       "Platform Modernization Q3"
└── has Task                  └── Task: "Rotate JWT keys"
      └── has_status                has_status: blocked
      └── depends_on          Task: "Define ERP contract"
      └── assigned_to               assigned_to: DevAgent
Decision                      Decision: "Stateless JWT over Redis"
└── affects Component               affects: "Auth Service"
Phase                         Phase: "implementation" → completion: 0.67

The OrchestratorAgent queries a pg_pragma projection:
"what is the phase completion rate for implementation?" → 0.67.
"what tasks are blocked?" → returns the full dependency chain in a single call.
No LLM inference. No re-reading workflow transcripts.
Workers write structured facts; the orchestrator reads pre-computed projections.

Example: ERP domain

Ontology (model)              Knowledge Graph (instance)
─────────────────             ──────────────────────────
Account                       "Acme Corp" (tier=enterprise)
└── has CostCenter            └── CostCenter: "Engineering"
└── has ApprovalFlow          └── ApprovalFlow: PO > €5k → CFO
GLEntry                       GL-20260501: amount=€12,400
└── linked_to CostCenter            linked_to: "Engineering"

A FinanceAgent resolving a billing anomaly calls query_context with
facets { component: "billing", role: "analyst" } and retrieves
account structure, approval rules, and GL entry history across sessions.
No CSV re-injection. No LLM-extracted summaries that cost tokens every run.

Example: CRM domain

Ontology (model)              Knowledge Graph (instance)
─────────────────             ──────────────────────────
Account                       "NovaTech" (tier=enterprise)
└── has Contact               └── Contact: "Maria Chen" (VP Engineering)
└── has Deal                  └── Deal: "Platform License" (stage=negotiation)
└── has Interaction           └── Call: 2026-05-14, sentiment=positive
└── has Constraint            └── Constraint: "No multi-year commitment"

A SalesAgent building a renewal brief retrieves account tier,
deal stage, last interaction sentiment, and contractual constraints
via query_context with faceted retrieval — not vector similarity.
The distinction matters: tier=enterprise AND stage=negotiation returns a precise slice.
A vector search returns ranked passages.

Example: Customer Support domain

Ontology (model)              Knowledge Graph (instance)
─────────────────             ──────────────────────────
Ticket                        TK-9821: priority=critical
└── linked_to Component       └── linked_to: "Auth Service"
└── references Incident       └── Incident: "JWT outage 2026-04-30"
KnownIssue                    KI-44: "Token expiry on clock skew"
└── affects Component               affects: "Auth Service"
└── has Resolution            Resolution: "Force clock sync + rolling restart"

A SupportAgent resolving TK-9821 calls assert_fact to link the ticket
to the known issue, then query_context to retrieve resolution history —
across sessions, without re-ingesting support documents every run.

The meta-ontology: where mindBrain becomes strategic

Each domain above is useful alone. The real leverage is connecting them.

         ERP ←──────────────────┐
                                 │
         CRM ←─────── mindBrain ─┼──→ Project Management
                                 │
         Support ←───────────────┘
                                 │
                                 └──→ Document Knowledge / RAG

A LlamaIndex multi-agent pipeline working on a cross-domain task —
say, an enterprise renewal involving a billing dispute,
an open critical support ticket, and a stalled deal —
can query all four domains from the same shared registry in one pass.

LlamaIndex continues to own document retrieval and RAG.
mindBrain owns operational context: decisions, task state, entity relationships,
phase progression, and cross-domain links.

That is the clean boundary: LlamaIndex retrieves documents.
MindBrain navigates the structured world those documents describe.

Two components, one integration

mindBrain (SQLITE : Personal edition)

The data layer. It organizes domain knowledge into three constructs:

Facets : typed dimensions that structure a domain — subject, predicate, role, phase, agent, topic, status, priority, ...
Semantic graphs : named typed relations between entities — depends_on, assigned_to, affects, linked_to, validated_by, contradicts, ...
Projections (pg_pragma) : pre-computed views — phase completion rates, blocker queues, agent liveness, KG coverage — surfaced at zero inference cost

Agents don't run vector search as the default path.
They call structured queries: faceted retrieval, graph traversal, projection reads.
Vector search is available as fallback for unstructured content — it remains LlamaIndex's domain.

GhostCrab MCP

The gateway layer. An MCP sidecar that gives LlamaIndex agents the tools to:

Read and write the shared ontology registry without direct database access
Execute faceted queries, graph traversal, and projection reads
Persist structured facts from workflow turns into durable ontology nodes

GhostCrab is a protocol bridge. mindBrain exists and operates independently.

LlamaIndex AgentWorkflow
├── FunctionAgent A  (ResearchAgent)
├── FunctionAgent B  (BuilderAgent)
└── FunctionAgent C  (ReviewerAgent)
        │ assert_fact / query_context
        ▼
GhostCrab MCP  (stdio or HTTP)
        │
        ▼
mindBrain SQLITE
├── facets         — typed query dimensions
├── graphs         — semantic edges across domains
└── pg_pragma      — orchestrator projections (phase gates, blocker queues, agent liveness)

Why this belongs outside LlamaIndex core

It should stay external.

LlamaIndex already exposes the right seams: BaseMemory for full replacement,
BaseMemoryBlock for additive adoption, and StorageContext with graph_store
for ingestion pipelines. MindBrain uses all three without modifying LlamaIndex itself.

Domain ontologies are specific to each application's context.
The framework cannot know in advance what is worth persisting,
how entities relate, or what projections an orchestrator needs.
That belongs to the workflow builder.

If GhostCrab is absent : LlamaIndex behaves exactly as before.
If mindBrain is empty : agents start with a blank namespace and populate it
through normal assert_fact calls during the first workflow run.

Three integration paths

Path 1 — Custom `BaseMemory` (primary, shared registry)

Drop-in replacement. All agents in the workflow share one namespace.

from llama_index.core.memory import BaseMemory
from ghostcrab_mcp import MindBrainClient

class MindBrainMemory(BaseMemory):
    def __init__(self, agent_id: str, ontology_ns: str):
        self.client = MindBrainClient(agent_id=agent_id, ns=ontology_ns)

    async def put(self, message: ChatMessage) -> None:
        await self.client.assert_fact(
            subject=self.agent_id,
            predicate="observed",
            object=message.content,
            context={"role": message.role}
        )

    async def get(self, input: str = "") -> list[ChatMessage]:
        facts = await self.client.query_context(
            agent=self.agent_id,
            semantic_filter=input,
            facets=["role", "topic", "recency"]
        )
        return [ChatMessage(role=f.role, content=f.content) for f in facts]

# All agents share the same namespace → same registry
research_agent = FunctionAgent(memory=MindBrainMemory("researcher", "project.alpha"), ...)
writer_agent   = FunctionAgent(memory=MindBrainMemory("writer",     "project.alpha"), ...)

Path 2 — Custom `BaseMemoryBlock` (additive, no disruption)

Bolt GhostCrab on as an additional block with priority=0 — never evicted.
Existing native memory blocks remain untouched.

from llama_index.core.memory import BaseMemoryBlock

class MindBrainOntologyBlock(BaseMemoryBlock):
    priority = 0  # never evicted under token pressure

    async def _aget(self, messages, **kwargs) -> str:
        # Inject structured ontology context into the system prompt
        return await ghostcrab.render_context_for(messages)

    async def _aput(self, messages, **kwargs) -> None:
        await ghostcrab.index_agent_turn(messages)

Path 3 — `StorageContext` graph store (document ingestion path)

For teams using VectorStoreIndex who want ontology relations resolved
alongside document retrieval.

from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(
    vector_store=MindBrainVectorStore(ns="project.docs"),
    graph_store=MindBrainGraphStore(ontology="mindbrain://project")
)

→ Full configuration, skill files, and tested LlamaIndex walkthrough:
https://github.com/mindflight-orchestrator/ghostcrab-personal-mcp/blob/main/ghostcrab-integrations/llamaindex/SKILL_llamaindex_ghostcrab.md

https://github.com/mindflight-orchestrator/ghostcrab-personal-mcp/blob/main/ghostcrab-integrations/llamaindex/SKILL_ghostcrab_runtime.md

What changes in practice

Native LlamaIndex memory	With MindBrain / GhostCrab
SQLite in-memory per agent	Shared persisted SQLite
LLM-extracted facts (token cost every run)	Structurally asserted ontology facts (zero inference cost)
Vector-only retrieval	Faceted retrieval: subject + predicate + context + semantics
Context lost across workflow runs	Pipeline-wide registry all agents query
No shared memory across `FunctionAgent` instances	One namespace → one registry → all agents co-visible
No knowledge typing	Explicit ontology ties (`observed`, `owns`, `depends_on`, `validated_by`)

Scope of this request

Adds GhostCrab MCP + mindBrain SQLITE (Personal edition) to the LlamaIndex integrations catalog
No changes to LlamaIndex core agents, workflows, memory, or storage interfaces
No new required dependencies beyond existing LlamaIndex install
Boundary-preserving: LlamaIndex owns document retrieval and RAG; mindBrain owns operational context


Repository	https://github.com/mindflight-orchestrator/ghostcrab-personal-mcp
License	Apache 2.0
Transport	stdio (local), CLI
Backend	SQLite (bundled via mindBrain Personal)
LlamaIndex compatibility	`llama-index-core`, `BaseMemory`, `BaseMemoryBlock`, `StorageContext`
Integration type	Custom `BaseMemory` drop-in + optional `MemoryBlock` + `StorageContext` graph store

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GhostCrab MCP + mindBrain SQLite — structured domain navigation for llama_index sessions #21745

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

GhostCrab MCP + mindBrain SQLite — structured domain navigation for llama_index sessions #21745

Uh oh!

FrancoisLamotte May 21, 2026

What problem this solves

What a domain looks like in practice

Example: multi-agent project delivery

Example: ERP domain

Example: CRM domain

Example: Customer Support domain

The meta-ontology: where mindBrain becomes strategic

Two components, one integration

mindBrain (SQLITE : Personal edition)

GhostCrab MCP

Why this belongs outside LlamaIndex core

Three integration paths

Path 1 — Custom BaseMemory (primary, shared registry)

Path 2 — Custom BaseMemoryBlock (additive, no disruption)

Path 3 — StorageContext graph store (document ingestion path)

What changes in practice

Scope of this request

Replies: 0 comments

FrancoisLamotte
May 21, 2026

Path 1 — Custom `BaseMemory` (primary, shared registry)

Path 2 — Custom `BaseMemoryBlock` (additive, no disruption)

Path 3 — `StorageContext` graph store (document ingestion path)