Created by @snapsynapse
Reviewed and merged by the Open Brain maintainer team — thank you for building the future of AI memory!
Parse your Obsidian vault and import notes into Open Brain as searchable, embedded thoughts.
Takes any Obsidian vault directory, parses every markdown note (including frontmatter tags, dates, and wikilinks), chunks long notes into atomic thoughts, generates vector embeddings, and inserts everything into your Open Brain thoughts table. Your entire vault becomes semantically searchable through any MCP-connected AI.
This recipe works with any Obsidian vault regardless of organizational method. It parses standard markdown, frontmatter, wikilinks, and tags — features common to all major patterns.
| Pattern | Structure | Import notes |
|---|---|---|
| BASB / PARA | Folders: Projects, Areas, Resources, Archives | Works out of the box. Use --skip-folders to exclude Archives if desired. |
| LYT / Ideaverse | Maps of Content (MOCs) as hub notes, emergent linking | MOCs import as thoughts; wikilinks are captured in metadata. |
| LifeHQ | Full life OS with dashboards, tasks, planning workflows | Use --skip-folders Templates to exclude Templater files. Tested on 500+ note vault. |
| FLAP | Pipeline: Fleeting → Literature → Atomic → Permanent | Atomic notes import cleanly. Literature notes chunk well via heading splits. |
| Zettelkasten | Atomic notes with dense linking, bottom-up structure | Ideal fit — small atomic notes map 1:1 to thoughts. |
| MOC-centric hubs | Curated hub notes per topic, domain, or role | Hub notes import with all wikilinks preserved in metadata for graph traversal. |
No special configuration is needed for any of these — the script handles them all with the same parsing pipeline. Use --dry-run to preview your vault before importing.
- Working Open Brain setup (guide)
- Python 3.10+
- Your Supabase project URL and API key
- OpenRouter API key (for embeddings and optional LLM chunking)
- Recommended: add a
content_fingerprintcolumn and unique index for database-level dedup (see Re-running and Deduplication)
Copy this block into a text editor and fill it in as you go.
OBSIDIAN VAULT IMPORT -- CREDENTIAL TRACKER
--------------------------------------
FROM YOUR OPEN BRAIN SETUP
Supabase Project URL: ____________
Supabase API key: ____________
OpenRouter API key: ____________
FILE LOCATION
Path to Obsidian vault: ____________
--------------------------------------
-
Clone or copy this recipe folder to your local machine.
-
Install Python dependencies:
cd recipes/obsidian-vault-import pip install -r requirements.txt -
Create your
.envfile from the example:cp .env.example .env
Edit
.envand fill in your Supabase URL, API key, and OpenRouter API key. Find your Supabase credentials in the dashboard under Settings → API. -
Run a dry run first to see what would be imported:
python import-obsidian.py /path/to/your/vault --dry-run --verbose
This scans your vault, shows how many notes pass filters, how many thoughts would be generated, and flags any notes containing potential secrets — without inserting anything.
-
Start with a small batch to verify everything works:
python import-obsidian.py /path/to/your/vault --limit 20 --verbose
The script runs a preflight check before any import — it verifies your Supabase connection and OpenRouter API key before spending time on chunking or embeddings.
-
Run the full import once you're satisfied:
python import-obsidian.py /path/to/your/vault --verbose
-
Verify in Supabase. Open your Supabase dashboard → Table Editor →
thoughts. You should see rows with:content— your note text with an[Obsidian: Title | Folder]prefixembedding— a 1536-dimensional vectormetadata— JSON with source, title, folder, tags, date, and wikilinks
| Flag | Description |
|---|---|
--dry-run |
Preview without inserting (shows what would be imported) |
--limit N |
Process only the first N notes |
--min-words N |
Skip notes with fewer than N words (default: 50) |
--skip-folders X |
Comma-separated additional folder names to skip |
--after DATE |
Only import notes modified after this date (YYYY-MM-DD) |
--no-llm |
Disable LLM chunking — heading splits only, zero API cost beyond embeddings |
--no-embed |
Skip embedding generation (insert thoughts without vectors) |
--no-secret-scan |
Disable secret detection (not recommended) |
--verbose |
Show detailed progress for each note |
--report |
Generate an import-report.md summary file |
The script automatically skips notes that wouldn't make useful thoughts. Run with --dry-run --verbose to preview exactly what gets included and excluded.
Always-skipped folders (Obsidian internals, not your content):
.obsidian/— plugin configs, themes, workspace state.trash/— Obsidian's soft-delete folder.git/,node_modules/— version control and dependencies- Any folder starting with
.(hidden directories)
Template files — notes inside any folder with "templates" in its name (case-insensitive). These contain Templater syntax (<% %>) and placeholder variables, not real content. Applies to Templates/, 8_Reference/Templates/, etc.
Short notes — notes with fewer than 50 words (default). These are typically stubs, empty MOCs, or link-only index files. Adjust with --min-words:
--min-words 20to include shorter notes--min-words 100to be more selective
Already-imported notes — the sync log tracks content hashes so re-runs skip unchanged notes automatically.
Date-filtered notes — when using --after YYYY-MM-DD, notes not modified after that date are skipped.
Additional folder exclusions — use --skip-folders for vault-specific directories you don't want imported:
# Skip framework reference materials, archive, and attachments
python import-obsidian.py /path/to/vault --skip-folders "Archive,Files,patterns"The script scans each thought for potential secrets before embedding or inserting. Thoughts containing API keys, tokens, passwords, or connection strings are skipped and logged — they never reach your database.
Detected patterns include:
- API keys (OpenAI, OpenRouter, AWS, GitHub, Supabase)
- JWT tokens
- Private key blocks
- Connection strings with embedded credentials
- Generic secret assignments (
password=,token=,api_key=, etc.)
The dry run (--dry-run) also runs the scanner, so you can review what would be flagged before a live import. If the scanner flags a false positive, use --no-secret-scan to disable it.
The script uses a hybrid chunking strategy to turn notes into atomic thoughts:
- Short notes (under 500 words) become a single thought.
- Notes with headings are split at
##(H2) boundaries — each section becomes one thought. - Long sections (over 1000 words) are sent to an LLM (gpt-4o-mini via OpenRouter) which distills them into 1-3 standalone thoughts.
Use --no-llm to skip step 3 if you want to avoid LLM costs. Heading-based splitting still works.
Costs depend on vault size and whether LLM chunking is enabled. Embeddings use text-embedding-3-small and LLM chunking uses gpt-4o-mini, both via OpenRouter.
| Vault size | Embeddings only (--no-llm) |
With LLM chunking |
|---|---|---|
| 100 notes | ~$0.02 | ~$0.15 |
| 500 notes | ~$0.10 | ~$0.75 |
| 1000+ notes | ~$0.20 | ~$1.50 |
Use --dry-run to see how many thoughts your vault would generate before committing to a full run. Use --no-embed to skip embeddings entirely (zero API cost) if you plan to generate them separately.
Time estimate: Roughly 1 second per thought or 16 minutes per 1,000 thoughts. A 700-note vault producing ~2,700 thoughts takes about 45 minutes. The bottleneck is embedding generation — each thought requires a round-trip API call.
The script self-throttles to respect upstream API limits:
- 150ms delay between embedding API calls to avoid flooding OpenRouter
- 1-second pause every 50 inserts to give Supabase breathing room
- Exponential backoff on 429/5xx errors (2s → 4s → 8s, up to 3 retries)
For large vaults (1000+ notes), use --limit to import in batches if you encounter rate limit errors. The sync log ensures you can resume where you left off.
See also: Graceful Boundaries — a spec for how services communicate operational limits to humans and agents.
The script prevents duplicates at two levels:
Local sync log (obsidian-sync-log.json) — tracks content hashes of imported notes. On re-runs, unchanged notes are skipped entirely (saving embedding API calls). Modified notes are re-imported, and new notes are added.
Database-level fingerprinting — each thought includes a content_fingerprint (SHA-256 of normalized content). If your thoughts table has a unique index on content_fingerprint, the insert automatically skips duplicates even if the sync log is deleted or a different machine imports overlapping content. To add the index:
CREATE UNIQUE INDEX IF NOT EXISTS thoughts_content_fingerprint_idx
ON thoughts (content_fingerprint);Without the index, the fingerprint is stored but dedup relies on the sync log only.
To do a clean re-import, delete obsidian-sync-log.json and remove the imported thoughts from your thoughts table (filter by metadata->>'source' = 'obsidian').
After a successful import, searching your Open Brain for topics from your vault returns relevant results. The metadata on each thought includes:
{
"source": "obsidian",
"title": "Note Title",
"folder": "Projects/My Project",
"tags": ["project", "active"],
"date": "2026-01-15",
"wikilinks": ["Related Note", "Another Note"]
}You can filter by source to find only Obsidian-imported thoughts: search with {"source": "obsidian"} as a metadata filter.
The default source value is "obsidian" — useful when you want to filter all
Obsidian-imported thoughts as one group. If you have multiple upstream tools
feeding into Obsidian-style markdown (e.g. an Apple Journal exporter, a
Day One exporter, an old Roam dump), pass --source-label to distinguish
them in OB1:
python import-obsidian.py /path/to/journal-vault --source-label apple-journal
python import-obsidian.py /path/to/dayone-vault --source-label day-oneAfter this, your metadata filter becomes {"source": "apple-journal"} to
retrieve only Apple Journal entries, etc.
Issue: python-frontmatter not found
Solution: Make sure you ran pip install -r requirements.txt. If using a virtual environment, activate it first.
Issue: Import is slow on large vaults (1000+ notes)
Solution: Embedding generation is the bottleneck — each note requires an API call. Use --limit to import in batches, or --no-llm to skip LLM chunking and reduce API calls. The script rate-limits itself to avoid hitting OpenRouter quotas.
Issue: Some notes are skipped unexpectedly
Solution: Run with --verbose to see which notes are filtered and why. Common reasons: notes under 50 words (adjust with --min-words), notes in a Templates/ folder, or notes already in the sync log. Check --dry-run output first.
Issue: Encoding errors on some notes Solution: The parser handles encoding errors gracefully — problematic files are skipped with a warning. If you see many parse errors, your vault may contain non-UTF-8 files. The script will continue processing the rest.
Issue: Duplicate thoughts after re-running
Solution: The sync log prevents duplicates on re-runs. For stronger protection, add the content_fingerprint unique index (see "Re-running and Deduplication" above) — this catches duplicates even if the sync log is deleted or another machine imports the same content. To clean up existing duplicates, filter by metadata->>'source' = 'obsidian' in the Supabase Table Editor.
Issue: Import aborts after "10 consecutive insert failures"
Solution: The script stops early if 10 inserts fail in a row to avoid wasting embedding credits. Check your Supabase connection, verify the thoughts table exists, and confirm your API key is correct. The preflight check catches most of these, but a connection drop mid-import can also trigger this.
Issue: Notes flagged as containing secrets (false positive)
Solution: Review the flagged content. If it's a false positive (e.g., a note discussing API key formats without containing real keys), re-run with --no-secret-scan. The scanner is intentionally conservative — it's better to flag and skip than to store a real secret in your database.