Name	Name	Last commit message	Last commit date
parent directory ..
.env.example	.env.example
.gitignore	.gitignore
README.md	README.md
import-obsidian.py	import-obsidian.py
metadata.json	metadata.json
requirements.txt	requirements.txt

Obsidian Vault Import

Reviewed and merged by the Open Brain maintainer team — thank you for building the future of AI memory!

Parse your Obsidian vault and import notes into Open Brain as searchable, embedded thoughts.

What It Does

Takes any Obsidian vault directory, parses every markdown note (including frontmatter tags, dates, and wikilinks), chunks long notes into atomic thoughts, generates vector embeddings, and inserts everything into your Open Brain thoughts table. Your entire vault becomes semantically searchable through any MCP-connected AI.

Vault Compatibility

This recipe works with any Obsidian vault regardless of organizational method. It parses standard markdown, frontmatter, wikilinks, and tags — features common to all major patterns.

Pattern	Structure	Import notes
BASB / PARA	Folders: Projects, Areas, Resources, Archives	Works out of the box. Use `--skip-folders` to exclude Archives if desired.
LYT / Ideaverse	Maps of Content (MOCs) as hub notes, emergent linking	MOCs import as thoughts; wikilinks are captured in metadata.
LifeHQ	Full life OS with dashboards, tasks, planning workflows	Use `--skip-folders Templates` to exclude Templater files. Tested on 500+ note vault.
FLAP	Pipeline: Fleeting → Literature → Atomic → Permanent	Atomic notes import cleanly. Literature notes chunk well via heading splits.
Zettelkasten	Atomic notes with dense linking, bottom-up structure	Ideal fit — small atomic notes map 1:1 to thoughts.
MOC-centric hubs	Curated hub notes per topic, domain, or role	Hub notes import with all wikilinks preserved in metadata for graph traversal.

No special configuration is needed for any of these — the script handles them all with the same parsing pipeline. Use --dry-run to preview your vault before importing.

Prerequisites

Working Open Brain setup (guide)
Python 3.10+
Your Supabase project URL and API key
OpenRouter API key (for embeddings and optional LLM chunking)
Recommended: add a content_fingerprint column and unique index for database-level dedup (see Re-running and Deduplication)

Credential Tracker

Copy this block into a text editor and fill it in as you go.

OBSIDIAN VAULT IMPORT -- CREDENTIAL TRACKER
--------------------------------------

FROM YOUR OPEN BRAIN SETUP
  Supabase Project URL:  ____________
  Supabase API key:      ____________
  OpenRouter API key:    ____________

FILE LOCATION
  Path to Obsidian vault:  ____________

--------------------------------------

Steps

Clone or copy this recipe folder to your local machine.

Install Python dependencies:

cd recipes/obsidian-vault-import
pip install -r requirements.txt

Create your .env file from the example:
```
cp .env.example .env
```
Edit .env and fill in your Supabase URL, API key, and OpenRouter API key. Find your Supabase credentials in the dashboard under Settings → API.
Run a dry run first to see what would be imported:
```
python import-obsidian.py /path/to/your/vault --dry-run --verbose
```
This scans your vault, shows how many notes pass filters, how many thoughts would be generated, and flags any notes containing potential secrets — without inserting anything.
Start with a small batch to verify everything works:
```
python import-obsidian.py /path/to/your/vault --limit 20 --verbose
```
The script runs a preflight check before any import — it verifies your Supabase connection and OpenRouter API key before spending time on chunking or embeddings.

Run the full import once you're satisfied:

python import-obsidian.py /path/to/your/vault --verbose

Verify in Supabase. Open your Supabase dashboard → Table Editor → thoughts. You should see rows with:
- content — your note text with an [Obsidian: Title | Folder] prefix
- embedding — a 1536-dimensional vector
- metadata — JSON with source, title, folder, tags, date, and wikilinks

Options

Flag	Description
`--dry-run`	Preview without inserting (shows what would be imported)
`--limit N`	Process only the first N notes
`--min-words N`	Skip notes with fewer than N words (default: 50)
`--skip-folders X`	Comma-separated additional folder names to skip
`--after DATE`	Only import notes modified after this date (YYYY-MM-DD)
`--no-llm`	Disable LLM chunking — heading splits only, zero API cost beyond embeddings
`--no-embed`	Skip embedding generation (insert thoughts without vectors)
`--no-secret-scan`	Disable secret detection (not recommended)
`--verbose`	Show detailed progress for each note
`--report`	Generate an `import-report.md` summary file

What Gets Filtered

The script automatically skips notes that wouldn't make useful thoughts. Run with --dry-run --verbose to preview exactly what gets included and excluded.

Always-skipped folders (Obsidian internals, not your content):

.obsidian/ — plugin configs, themes, workspace state
.trash/ — Obsidian's soft-delete folder
.git/, node_modules/ — version control and dependencies
Any folder starting with . (hidden directories)

Template files — notes inside any folder with "templates" in its name (case-insensitive). These contain Templater syntax (<% %>) and placeholder variables, not real content. Applies to Templates/, 8_Reference/Templates/, etc.

Short notes — notes with fewer than 50 words (default). These are typically stubs, empty MOCs, or link-only index files. Adjust with --min-words:

--min-words 20 to include shorter notes
--min-words 100 to be more selective

Already-imported notes — the sync log tracks content hashes so re-runs skip unchanged notes automatically.

Date-filtered notes — when using --after YYYY-MM-DD, notes not modified after that date are skipped.

Additional folder exclusions — use --skip-folders for vault-specific directories you don't want imported:

# Skip framework reference materials, archive, and attachments
python import-obsidian.py /path/to/vault --skip-folders "Archive,Files,patterns"

Secret Detection

The script scans each thought for potential secrets before embedding or inserting. Thoughts containing API keys, tokens, passwords, or connection strings are skipped and logged — they never reach your database.

Detected patterns include:

API keys (OpenAI, OpenRouter, AWS, GitHub, Supabase)
JWT tokens
Private key blocks
Connection strings with embedded credentials
Generic secret assignments (password=, token=, api_key=, etc.)

The dry run (--dry-run) also runs the scanner, so you can review what would be flagged before a live import. If the scanner flags a false positive, use --no-secret-scan to disable it.

How Chunking Works

The script uses a hybrid chunking strategy to turn notes into atomic thoughts:

Short notes (under 500 words) become a single thought.
Notes with headings are split at ## (H2) boundaries — each section becomes one thought.
Long sections (over 1000 words) are sent to an LLM (gpt-4o-mini via OpenRouter) which distills them into 1-3 standalone thoughts.

Use --no-llm to skip step 3 if you want to avoid LLM costs. Heading-based splitting still works.

Cost Estimate

Costs depend on vault size and whether LLM chunking is enabled. Embeddings use text-embedding-3-small and LLM chunking uses gpt-4o-mini, both via OpenRouter.

Vault size	Embeddings only (`--no-llm`)	With LLM chunking
100 notes	~$0.02	~$0.15
500 notes	~$0.10	~$0.75
1000+ notes	~$0.20	~$1.50

Use --dry-run to see how many thoughts your vault would generate before committing to a full run. Use --no-embed to skip embeddings entirely (zero API cost) if you plan to generate them separately.

Time estimate: Roughly 1 second per thought or 16 minutes per 1,000 thoughts. A 700-note vault producing ~2,700 thoughts takes about 45 minutes. The bottleneck is embedding generation — each thought requires a round-trip API call.

Rate Limiting

The script self-throttles to respect upstream API limits:

150ms delay between embedding API calls to avoid flooding OpenRouter
1-second pause every 50 inserts to give Supabase breathing room
Exponential backoff on 429/5xx errors (2s → 4s → 8s, up to 3 retries)

For large vaults (1000+ notes), use --limit to import in batches if you encounter rate limit errors. The sync log ensures you can resume where you left off.

See also: Graceful Boundaries — a spec for how services communicate operational limits to humans and agents.

Re-running and Deduplication

The script prevents duplicates at two levels:

Local sync log (obsidian-sync-log.json) — tracks content hashes of imported notes. On re-runs, unchanged notes are skipped entirely (saving embedding API calls). Modified notes are re-imported, and new notes are added.

Database-level fingerprinting — each thought includes a content_fingerprint (SHA-256 of normalized content). If your thoughts table has a unique index on content_fingerprint, the insert automatically skips duplicates even if the sync log is deleted or a different machine imports overlapping content. To add the index:

CREATE UNIQUE INDEX IF NOT EXISTS thoughts_content_fingerprint_idx
  ON thoughts (content_fingerprint);

Without the index, the fingerprint is stored but dedup relies on the sync log only.

To do a clean re-import, delete obsidian-sync-log.json and remove the imported thoughts from your thoughts table (filter by metadata->>'source' = 'obsidian').

Expected Outcome

After a successful import, searching your Open Brain for topics from your vault returns relevant results. The metadata on each thought includes:

{
  "source": "obsidian",
  "title": "Note Title",
  "folder": "Projects/My Project",
  "tags": ["project", "active"],
  "date": "2026-01-15",
  "wikilinks": ["Related Note", "Another Note"]
}

You can filter by source to find only Obsidian-imported thoughts: search with {"source": "obsidian"} as a metadata filter.

Customizing the source label

The default source value is "obsidian" — useful when you want to filter all Obsidian-imported thoughts as one group. If you have multiple upstream tools feeding into Obsidian-style markdown (e.g. an Apple Journal exporter, a Day One exporter, an old Roam dump), pass --source-label to distinguish them in OB1:

python import-obsidian.py /path/to/journal-vault --source-label apple-journal
python import-obsidian.py /path/to/dayone-vault   --source-label day-one

After this, your metadata filter becomes {"source": "apple-journal"} to retrieve only Apple Journal entries, etc.

Troubleshooting

Issue: python-frontmatter not found Solution: Make sure you ran pip install -r requirements.txt. If using a virtual environment, activate it first.

Issue: Import is slow on large vaults (1000+ notes) Solution: Embedding generation is the bottleneck — each note requires an API call. Use --limit to import in batches, or --no-llm to skip LLM chunking and reduce API calls. The script rate-limits itself to avoid hitting OpenRouter quotas.

Issue: Some notes are skipped unexpectedly Solution: Run with --verbose to see which notes are filtered and why. Common reasons: notes under 50 words (adjust with --min-words), notes in a Templates/ folder, or notes already in the sync log. Check --dry-run output first.

Issue: Encoding errors on some notes Solution: The parser handles encoding errors gracefully — problematic files are skipped with a warning. If you see many parse errors, your vault may contain non-UTF-8 files. The script will continue processing the rest.

Issue: Duplicate thoughts after re-running Solution: The sync log prevents duplicates on re-runs. For stronger protection, add the content_fingerprint unique index (see "Re-running and Deduplication" above) — this catches duplicates even if the sync log is deleted or another machine imports the same content. To clean up existing duplicates, filter by metadata->>'source' = 'obsidian' in the Supabase Table Editor.

Issue: Import aborts after "10 consecutive insert failures" Solution: The script stops early if 10 inserts fail in a row to avoid wasting embedding credits. Check your Supabase connection, verify the thoughts table exists, and confirm your API key is correct. The preflight check catches most of these, but a connection drop mid-import can also trigger this.

Issue: Notes flagged as containing secrets (false positive) Solution: Review the flagged content. If it's a false positive (e.g., a note discussing API key formats without containing real keys), re-run with --no-secret-scan. The scanner is intentionally conservative — it's better to flag and skip than to store a real secret in your database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Obsidian Vault Import

What It Does

Vault Compatibility

Prerequisites

Credential Tracker

Steps

Options

What Gets Filtered

Secret Detection

How Chunking Works

Cost Estimate

Rate Limiting

Re-running and Deduplication

Expected Outcome

Customizing the source label

Troubleshooting

Uh oh!

FilesExpand file tree

obsidian-vault-import

Directory actions

More options

Directory actions

More options

Latest commit

History

obsidian-vault-import

Folders and files

parent directory

README.md

Obsidian Vault Import

What It Does

Vault Compatibility

Prerequisites

Credential Tracker

Steps

Options

What Gets Filtered

Secret Detection

How Chunking Works

Cost Estimate

Rate Limiting

Re-running and Deduplication

Expected Outcome

Customizing the source label

Troubleshooting