A deep dive into building and maintaining your zero-RAG personal knowledge base.
- Philosophy
- Setup Walkthrough
- Your First Ingest
- Building the Wiki
- Querying Effectively
- Maintenance
- Scaling Up
- Troubleshooting
Traditional RAG (Retrieval Augmented Generation) works like this:
- Upload documents
- Chunk them into pieces
- When you ask a question, find relevant chunks
- Generate answer from chunks
The problem: every question starts from zero. The LLM rediscovers knowledge each time. Ask a subtle question that requires synthesizing five documents, and it has to find and piece together fragments every time. Nothing compounds.
Instead:
- You add a source
- The LLM reads it once and extracts key information
- It integrates that knowledge into a structured wiki
- Cross-references are created, contradictions flagged, synthesis built
- Next time you ask a question, the knowledge is already compiled
The wiki is a persistent, compounding artifact. Every source makes it richer. Every question you ask can be filed back in.
| Human | LLM |
|---|---|
| Curate sources | Read and extract |
| Ask good questions | Write and organize |
| Decide truth when sources conflict | Flag contradictions |
| Think about what it means | Do the bookkeeping |
You do the thinking. The LLM does everything else.
git clone https://github.com/JPeetz/MeMex-Zero-RAG.git my-wiki
cd my-wikiOpen SCHEMA.md and update:
-
Identity section: What is this wiki about?
This is a personal knowledge base about **machine learning research**.
-
Focus Areas: What topics will you cover?
## Primary Topics - Transformer architectures - Training optimization - Inference efficiency ## Entity Types - Researchers - Papers - Models - Datasets
Edit L1/identity.md:
## About Me
- **Name**: Jane
- **Role**: ML Research Engineer
- **Timezone**: US/Pacific
## Communication Preferences
- **Tone**: Technical but accessible
- **Detail level**: Thorough for research, brief for adminEdit L1/rules.md with any domain-specific gotchas:
## Domain-Specific Rules
### Papers
- Always note: title, authors, venue, year
- Flag if preprint vs peer-reviewed
- Note key contributions in bullet pointsgit add .
git commit -m "Initialize wiki with schema and structure"Note: L1/ is gitignored — it won't be committed.
- Open Obsidian
- "Open folder as vault" → select your wiki directory
- Enable "Detect all file extensions" in Settings → Files & Links
- The graph view will show your wiki's structure as it grows
Drop a document into raw/:
cp ~/Downloads/attention-is-all-you-need.pdf raw/Or create a markdown file with notes:
cat > raw/transformer-notes.md << 'EOF'
# Notes on Transformer Architecture
The Transformer was introduced in "Attention Is All You Need" (Vaswani et al., 2017).
Key innovations:
- Self-attention mechanism replacing recurrence
- Multi-head attention for parallel processing
- Positional encoding for sequence order
The model achieves SOTA on WMT translation benchmarks.
EOFOpen your AI agent and paste:
Read SCHEMA.md. Then ingest raw/transformer-notes.md following the ingest workflow.
Before writing anything, discuss the key takeaways with me.
The LLM will:
- Summarize what it found
- Propose which pages to create/update
- Wait for your approval
Example response:
"This source covers the Transformer architecture. I'll create:
- wiki/sources/transformer-notes.md (summary)
- wiki/entities/transformer.md (model page)
- wiki/entities/vaswani.md (researcher)
- wiki/concepts/self-attention.md (concept)
- wiki/concepts/multi-head-attention.md (concept)
Should I proceed?"
After approval, the LLM creates pages with:
- YAML frontmatter (title, type, dates, status)
- One-paragraph summary
- Structured content with citations
- Cross-references via
[[wikilinks]]
It updates wiki/index.md and wiki/log.md.
Open Obsidian. You'll see:
- New pages in the file tree
- Links between pages (click to navigate)
- Graph view showing connections
Start slow: Ingest 1-3 sources per session. Review the output. Guide the LLM on what to emphasize.
Batch later: Once the wiki has structure (20+ pages), you can batch-ingest with less supervision.
Quality over quantity: A wiki with 50 well-integrated pages beats 200 poorly-linked ones.
| Type | Tips |
|---|---|
| Research papers | Extract: title, authors, venue, key contributions, limitations |
| Articles | Extract: main argument, supporting evidence, counterpoints |
| Meeting notes | Extract: decisions, action items, attendees, context |
| Documentation | Extract: concepts, APIs, examples, gotchas |
| Books (chapters) | One source per chapter, link to book entity |
The LLM should link liberally:
- Every entity mentioned → link to its page
- Every concept used → link to its page
- Every source referenced → link to its summary
If a page has no inbound links after 30 days, lint will flag it.
When you query the wiki and get a valuable answer:
That answer is valuable. File it as wiki/synthesis/transformer-efficiency-comparison.md
Your explorations become part of the wiki. Knowledge compounds.
Read wiki/index.md. Based on the wiki, answer:
What are the main approaches to efficient Transformers?
Read wiki/index.md, then read ALL pages related to attention mechanisms.
Synthesize a comprehensive overview, noting:
- Where sources agree
- Where sources conflict
- What gaps exist
Read wiki/index.md. What are the 5 most interesting unexplored connections?
What sources would help investigate them?
Read wiki/index.md. What topics are mentioned but have no dedicated page?
What entities are referenced but not defined?
Quick lint: scan wiki/ for 🔴 ERROR issues only.
Run a full lint check on wiki/ per SCHEMA.md.
Output to wiki/lint-report-[today].md.
Check wiki/contradictions.md. For each pending:
Read wiki/contradictions.md. I want to resolve the [X] contradiction.
Present both claims with sources. I'll decide.
Read wiki/index.md. Identify:
- Pages that could be merged
- Stale pages that should be archived
- Sections that have grown large enough to split
The LLM reads wiki/index.md to find relevant pages. This works fine.
Install qmd:
npm install -g @tobilu/qmdQuery with:
Use qmd to search for "attention optimization" across wiki/.
Then read the top 5 results and synthesize an answer.
If topics are truly distinct, create separate wikis:
ml-research-wiki/ # Your main research wiki
project-alpha-wiki/ # Project-specific knowledge
reading-notes-wiki/ # Book notes and articles
Each has its own SCHEMA.md customized for its domain.
Reinforce in your prompt:
Remember: every factual claim MUST have [Source: path]. No exceptions.
Check that SCHEMA.md is clear about citation requirements.
Ask explicitly:
After creating the page, add [[wikilinks]] to all related entities and concepts.
Then add backlinks from those pages to this one.
Scan all files in wiki/. Compare to wiki/index.md.
Report discrepancies and offer to fix.
Log it:
Add to wiki/hallucinations.md:
- Date: [today]
- Page: [which page]
- Claim: [the false claim]
- How detected: [how you found out]
Then fix the page and add proper citation or remove the claim.
The LLM might not catch all conflicts. Run periodically:
Read all pages in wiki/entities/ and wiki/concepts/.
Check for any claims that conflict with each other.
Report findings.
# Remove from git history (careful!)
git filter-branch --force --index-filter \
'git rm -rf --cached --ignore-unmatch L1/' \
--prune-empty --tag-name-filter cat -- --all
# Force push (coordinate with collaborators)
git push origin --force --allThen rotate any credentials that were in L1/.
-
Start with one domain. A wiki about "everything" becomes nothing.
-
Ingest slowly at first. Guide the LLM on your preferences before batching.
-
Review the first 10 pages carefully. Patterns set early persist.
-
Use the graph view. Orphan clusters reveal missing connections.
-
File good answers back. Your queries are valuable sources.
-
Don't skip lint. Small issues compound into big messes.
-
Trust but verify. Spot-check citations, especially on important topics.
-
Let the schema evolve. Update SCHEMA.md as you learn what works.
Guide version: 1.0 | See SCHEMA.md and PROMPTS.md for reference
Copyright (c) 2026 Joerg Peetz. All rights reserved.