This is the technical source-of-truth doc for the memory model.
Use it when the question is:
- "What are the memory layers?"
- "What is trusted vs derived?"
- "How do workspace memory, wiki, and LightRAG fit together?"
- "What are the storage rules at architecture level?"
Read order:
- intuition first ->
docs/19-llm-wiki-memory-explained.md - this file next -> technical memory model
- runtime answer path after that ->
docs/15-llm-wiki-query-flow.md - save / promotion behavior ->
docs/17-knowledge-management.md - exact naming and machine rules ->
docs/16-llm-wiki-storage-model.mdandartifacts/llm-wiki/SCHEMA.md
The OpenClaw bot (Бенька) uses a multi-layer external memory system. The LLM remembers nothing between sessions by itself — all persistence lives in files and a knowledge graph.
The core principle: memory layers have a trust hierarchy. Conflating "what was decided once" with "what is true now" is the primary failure mode of any AI memory system.
LIVE > RAW > DERIVED
| Layer | Trust | Answers | Location |
|---|---|---|---|
| Live | Highest | "Is X running right now?" | docker ps, curl /health, logs |
| Raw | Historical | "Why did we decide X?" | workspace/raw/YYYY-MM-DD-{topic}.md |
| Derived | Quick recall | "What's worth remembering?" | MEMORY.md, daily notes, INDEX.md |
Never answer current-state questions from memory. After any update, restart, patch, or install: live-check first, then speak.
Example failure mode: three months ago a config was discussed in a thread (raw). The config was
later changed. If the bot answers from the old thread, it speaks confidently but incorrectly.
Only a live-check (cat /opt/openclaw/config/openclaw.json) reflects what actually exists now.
At architecture level, the LLM-Wiki has three storage layers:
raw/— immutable source evidencewiki/— curated markdown knowledge baseLightRAG— derived retrieval index over selected markdown
Inside wiki/, typed folders remain the primary structure:
entities/concepts/decisions/research/sessions/archive/research/for low-prominence archived research pages
Themes are a secondary navigation layer through frontmatter and TOPICS.md.
Exact slug and naming rules live in docs/16 and artifacts/llm-wiki/SCHEMA.md.
- Source: shell commands, Docker status, HTTP health endpoints
- Authoritative for: current running state, installed versions, active config
- Access:
docker compose ps,curl http://127.0.0.1:18789/healthz, logs - Never cached in memory files
Location: workspace/raw/YYYY-MM-DD-{topic}.md
Verbatim-near records of important threads, after redaction. Contains:
- Decisions with explicit reasoning ("we chose X because Y")
- Root causes of real failures
- New infrastructure entities (new service, tool, config key)
- Threads tagged
#canon - Rejected options with context ("we tried Y but Z, revisit if condition changes")
Promotion criteria — a thread enters raw ONLY if it has at least one of the above. Ephemeral chit-chat, troubleshooting with no conclusion, and off-topic content stay out.
Redaction pass before writing:
- Remove all live values: IPs, tokens, passwords, client IDs, cert CNs
- Replace with placeholders:
<SERVER_IP>,<BOT_TOKEN>,<CLIENT_CERT_CN> - Raw files ARE tracked in git (safe only after redaction)
Load policy: raw files are never loaded at boot. Load only on explicit recall request ("why did we reject X?"). If lightrag_query covers it, prefer that.
The fast-recall layer. Loaded at boot. Never canonical for current state.
| File | Purpose |
|---|---|
workspace/MEMORY.md |
Long-term curated facts: projects, preferences, professional profile |
workspace/INDEX.md |
Master catalog: what lives where, navigation quick reference |
workspace/memory/INDEX.md |
Daily note index: active files + archive pointer |
workspace/memory/YYYY-MM-DD.md |
Daily session logs (auto-generated by bot) |
workspace/memory/archive/ |
Compressed old daily notes |
The wiki is the durable human-readable knowledge artifact.
Important rules:
- explicit user saves must create a visible
wiki/research/**page research/**is a valid final state, not a temporary failure state- canonical pages in
concepts/**,entities/**, anddecisions/**are selective promotions, not the default output of every source - archived research pages remain searchable but lose prominence
Executed at the start of every session. Total context budget: ~5–8KB.
1. Read MEMORY.md (~2KB, always)
2. Read wiki/OVERVIEW.md (~1–2KB, hubs + recent updates + active decisions)
3. Read memory/INDEX.md (~1KB, find today + yesterday)
4. Read today's daily (if exists and topic is relevant)
5. Read yesterday's daily (only if today has <3 entries)
6. GET http://lightrag:9621/health (non-blocking, log status)
7. Mark unclosed tasks from previous session
8. Confirm ready: "Гав. Слушаю."
Never load raw/ at boot. Read wiki/OVERVIEW.md, not the full wiki/INDEX.md, on cold start. Never scan memory/ blindly — always go through INDEX.
| Trigger | Target | Action |
|---|---|---|
| New fact about Denis (pref, family, project) | MEMORY.md |
Append/update in-place, 1 line |
| Decision made during session | Today's daily note | Append [DECISION] X over Y. Reason: Z. |
| Root cause / #canon / rejected option | raw/YYYY-MM-DD-topic.md |
Full thread after redaction pass |
| System state changed (update/patch/install) | LIVE layer | Run check command, never from memory |
| Daily note older than 14 days | memory/archive/ |
Move, update INDEX entry |
| Obsidian note updated | LightRAG | Re-index via cron or lightrag-ingest.sh |
Max 3 lines per entry. No prose.
# Session Log: YYYY-MM-DD
## Decisions
- [DECISION] X chosen over Y. Reason: Z.
## Key Facts
- [FACT] New infra entity / preference / constraint.
## Open Items
- [TODO] Unresolved, carry forward.
## Root Causes / #canon
- [CANON] Root cause or architectural truth. #canonLoad lazily, not eagerly. Boot loads only ~5–8KB. Daily notes load on-demand. Raw never loads proactively.
Write compressed, not verbose.
- MEMORY.md entries: 1 line per fact, no prose
- Daily note entries: max 3 lines each
- Raw files: verbatim but stripped of pleasantries and off-topic filler
Index over scan.
memory/INDEX.mdis 1KB — read it, then load only what's needed- Never
cat memory/blindly
LightRAG replaces archive reads.
Instead of loading 30 daily files: lightrag_query("why did we reject X") → ~2KB answer.
Compression schedule for daily notes:
| Age | Action |
|---|---|
| 1–7 days | Full verbatim |
| 8–14 days | Bot compresses to 5-line summary, replaces in-place |
| 15+ days | Summary → memory/archive/, INDEX updated |
Compression is triggered by weekly HEARTBEAT check, not per-message.
LightRAG indexes curated markdown (workspace + LLM-Wiki + raw signal digests) and provides dual-mode retrieval: vector similarity + knowledge graph traversal.
What it replaces: bulk-reading archive files to find context. What it is not: a source of truth for current system state.
The workspace files are intentionally small enough to load at boot. That keeps a fresh OpenClaw session fast, but it means older daily notes, raw decision records, and Obsidian wiki pages should not be loaded blindly. LightRAG lets the bot search that long tail on demand.
The design goal is not "the bot remembers everything automatically." The goal is:
- fast boot context from curated files;
- durable source files that humans can read and edit;
- retrieval over older material when a question asks for history or background;
- explicit source references when memory influences a decision.
LightRAG ingests markdown from:
| Source | Server path | Typical content |
|---|---|---|
| OpenClaw workspace | /opt/openclaw/workspace |
identity, tool docs, curated memory, daily notes, raw decision threads |
| Obsidian wiki | /opt/obsidian-vault/wiki |
curated entity/concept/decision/research/session pages |
| Raw signal digests | /opt/obsidian-vault/raw/signals |
daily Last30Days signal snapshots |
Ingestion is batch-oriented:
- Syncthing or workspace deploy puts markdown files on the server.
/opt/lightrag/scripts/lightrag-ingest.shuploads markdown files withPOST /documents/upload.- The script calls
POST /documents/reprocess_failedto retry pending/failed docs. - LightRAG extracts chunks, entities, relationships, vectors, and document statuses.
Explicitly out of scope for LightRAG ingest v1:
/opt/obsidian-vault/raw/articles/opt/obsidian-vault/raw/documents- legacy vault material outside
wiki/
Cron runs the ingest script every 30 minutes. Manual re-index is useful after bulk edits.
OpenClaw calls LightRAG from the Docker network:
POST http://lightrag:9621/query
Content-Type: application/json
{"query": "why did we choose PostgreSQL", "mode": "hybrid"}
The host-local equivalent is http://127.0.0.1:8020/query.
Use the result as a ranked context bundle. The answer is convenient, but the references matter more
when accuracy is important. If LightRAG says references: [], treat it as "memory did not find
support," not as proof that the fact is false.
This project intentionally uses two retrieval layers with different indexing boundaries.
| Path | Role | Typical writer | LightRAG | OpenClaw memorySearch |
|---|---|---|---|---|
/opt/openclaw/workspace/MEMORY.md |
long-term curated facts | manual edits, workspace flow | yes | yes |
/opt/openclaw/workspace/memory/**/*.md |
daily notes and memory logs | bot, manual edits | yes | yes |
/opt/openclaw/workspace/raw/**/*.md |
raw decision threads in workspace | bot, manual edits | yes | no |
/opt/obsidian-vault/wiki/**/*.md |
canonical curated LLM-Wiki pages | wiki-import, bot, Syncthing |
yes | yes |
/opt/obsidian-vault/raw/signals/**/*.md |
daily signal snapshots | signals-bridge |
yes | no |
/opt/obsidian-vault/raw/articles/**/*.md |
raw article sources before curation | clipper, import bridge | no | no |
/opt/obsidian-vault/raw/documents/**/*.md |
raw document sources before curation | import bridge, manual | no | no |
legacy vault trees outside wiki/ |
legacy or unsanctioned vault material | legacy/manual | no | no |
Purpose: fast local recall inside the gateway.
Indexes:
/opt/openclaw/workspace/MEMORY.md/opt/openclaw/workspace/memory/**/*.md/opt/obsidian-vault/wiki/**/*.md
Does not index:
/opt/obsidian-vault/raw/signals/**/*.md/opt/obsidian-vault/raw/articles/**/*.md/opt/obsidian-vault/raw/documents/**/*.md- legacy vault trees outside
wiki/
Why: builtin memory should stay close to curated human-reviewed facts and canonical wiki pages. It is the lightweight recall layer for the agent, not the place to ingest every raw source.
Current tuning on the Hetzner CX23:
- Gemini provider-side batch embeddings enabled with
concurrency=1 wait=falseso future rebuilds can offload embedding work without blocking the host as hard- hybrid
candidateMultiplier=2 - MMR reranking disabled
Purpose: broader historical retrieval over curated markdown plus selected raw signal snapshots.
Indexes:
/opt/openclaw/workspace/**/*.md/opt/obsidian-vault/wiki/**/*.md/opt/obsidian-vault/raw/signals/**/*.md
Does not index:
/opt/obsidian-vault/raw/articles/**/*.md/opt/obsidian-vault/raw/documents/**/*.md- legacy vault trees outside
wiki/andraw/signals/
Why: raw/signals is compact and already useful as near-curated context, while raw/articles and
raw/documents are source staging areas that should become retrievable only after curated import
materializes them into canonical wiki pages.
New Obsidian source material should flow through the LLM-Wiki pipeline:
raw source -> curated import -> wiki page -> retrieval
If a document is still only in raw/articles or raw/documents, it is stored source material, not
part of the searchable memory profile.
For explicit user saves, the save is considered successful only after a visible wiki/research/**
page exists. LightRAG indexing is a secondary step over the resulting wiki pages, not the primary
proof that the knowledge was stored.
POST http://lightrag:9621/query
Content-Type: application/json
{"query": "why did we choose PostgreSQL", "mode": "hybrid"}
Modes: hybrid (recommended) · local (graph only) · global (vector only)
Question received:
├── "Is X running/available/configured NOW?"
│ → LIVE CHECK (docker ps / curl / logs) — never from memory
│
├── "Why did we choose X?" / "What was decided about Y?"
│ → raw/ if file exists for topic
│ → then daily notes (via memory/INDEX.md)
│ → then lightrag_query
│
└── "Who is Denis?" / "Denis's preferences?"
→ MEMORY.md (already in context at boot)
Obsidian serves as the external AI Wiki. Notes live as raw .md files — no embeddings required,
the LLM does the heavy lifting via LightRAG.
Vault location on server: /opt/obsidian-vault/
Sync method: Syncthing bidirectional sync between Mac iCloud vault and server
LightRAG mount: vault mounted read-only into LightRAG container
Re-index: cron job (scripts/lightrag-ingest.sh) runs every 30 minutes
The vault now has a split role:
wiki/— bot-maintained curated knowledgeraw/signals/— indexed raw inputsraw/articles/andraw/documents/— stored but not indexed until curated import materializes them intowiki/
Priority pyramid for what goes into Obsidian:
| Level | Content | Value |
|---|---|---|
| Top | Decisions & reasons (why X, why not Y) | Highest |
| Mid | Project facts (stack, structure, tasks) | High |
| Mid | Preferences (tools, deploy style) | Medium |
| Base | Documentation (README, API docs, guides) | Foundation |
Do NOT put in Obsidian: raw code, logs, DB dumps, drafts, credentials.
Weekly (triggered manually or by HEARTBEAT — never automatic per-message):
- Scan
workspace/memory/— files older than 14 days → compress and move to archive/ - Update
memory/INDEX.md— remove stale entries, add archive pointer - Scan for contradictions between
MEMORY.mdand recent raw/ entries - Trigger LightRAG re-index after bulk changes
workspace/
├── INDEX.md ← master catalog (read at boot)
├── MEMORY.md ← long-term curated facts (read at boot)
├── USER.md ← Denis's full profile (read at boot)
├── IDENTITY.md ← Бенька's persona (read at boot)
├── SOUL.md ← values, anti-sycophancy (read at boot)
├── AGENTS.md ← mission, memory protocol, boot algorithm (read at boot)
├── BOOT.md ← startup checklist (read at boot)
├── TOOLS.md ← available tools incl. lightrag_query (read at boot)
├── HEARTBEAT.md ← periodic task instruction (heartbeat only)
├── memory/
│ ├── INDEX.md ← daily note catalog (read at boot)
│ ├── YYYY-MM-DD.md ← daily session logs (load on-demand)
│ └── archive/ ← compressed old notes (via lightrag_query only)
└── raw/
└── YYYY-MM-DD-{topic}.md ← verbatim decisions, redacted (load on explicit recall)
Two Telegram topics drive the primary ingestion workflow. See docs/17-knowledge-management.md for full details.
| Topic | Behaviour | Memory class |
|---|---|---|
💡 Ideas (id=639) |
Any forwarded post / link / text → light-curated wiki/research/** capture via wiki_ingest(capture_mode=ideas) |
explicit-light |
📚 Knowledgebase (id=232) |
Question → search; any content → bot auto-structures + wiki_ingest |
CURATED |
Content reaches durable wiki storage on every explicit save. The difference is curation depth:
Ideascreates a visiblewiki/research/**landing page immediately, but keeps curation lightKnowledgebasecreates the same landing page with stronger curated intent- promotion from
Ideasenriches the existing artifact chain instead of materializing it for the first time
Passive scheduled feeds such as telegram-digest and signals are a separate storage class and do
not have to create wiki pages unless explicitly promoted or saved. User never fills structured
fields manually — the bot extracts title, domain, source, date, summary, sensitivity automatically.
Practical operator rule:
Ideasmeans "capture now into wiki, curate deeper later"Knowledgebasemeans "this should become durable system knowledge"
So the split is by intent, not by topic:
- raw stream / maybe-useful / inbox material →
Ideas - durable knowledge / principles / references that should be searchable later →
Knowledgebase