Seojae (서재) — Wiki Schema

This file defines the rules and workflows for your LLM-powered knowledge wiki. Your LLM coding tool reads this file and follows these instructions to manage your wiki automatically. It is tool-agnostic — it works with Claude Code, Codex CLI, Gemini CLI, or any LLM coding tool that reads markdown instructions.

schema_version: "1.0"

# Wiki Configuration
wiki_language: en    # Default language for meta files (index.md, log.md)
                     # Options: en, ko, ja, zh, etc.
                     # Page content follows source language rules below

Project Overview

Seojae is a pattern-based personal knowledge wiki with a 3-tier architecture:

Raw sources (raw/) — Immutable originals. Only the user adds files; the LLM reads only.
Wiki (wiki/) — LLM-generated/maintained markdown pages: summaries, entities, concepts, synthesis.
Schema (WIKI_SCHEMA.md) — This file. Defines wiki rules and workflows.

Prerequisites

Python 3.9+ with pip
Internet access on first run (downloads ~470MB embedding model)
An LLM coding tool: Claude Code, Codex CLI, Gemini CLI, or similar

If you cannot download the model immediately, you can skip the search index setup and build it later with the Reindex workflow.

Setup

When a user asks to initialize this wiki, perform these steps:

1. Update your tool's rule file

The repo ships starter stubs. After initialization, update the file for your tool with environment-specific settings.

Claude Code (CLAUDE.md):

@WIKI_SCHEMA.md
@extensions/search-chromadb.md
@extensions/obsidian.md

# Environment
- Python: use `venv/bin/python` prefix or `source venv/bin/activate`
- All tools/ scripts require the venv to be active

Codex CLI (AGENTS.md):

(Full WIKI_SCHEMA.md content is already inlined)
(Append active extension contents below)

# Environment
- Python: always use `venv/bin/python` prefix (venv activation
  does not persist between commands in Codex)

Gemini CLI (GEMINI.md):

@./WIKI_SCHEMA.md
@./extensions/search-chromadb.md
@./extensions/obsidian.md

# Environment
- Python: use `venv/bin/python` prefix or `source venv/bin/activate`

2. Set up Python environment

Create venv: python3 -m venv venv
Install: venv/bin/pip install -r requirements.txt

3. Build search index

Run the search command defined by the active search extension.
Default (search-chromadb): venv/bin/python tools/search.py --reindex
Note: first run downloads the embedding model (~470MB).

4. Verify

Run a test query: venv/bin/python tools/search.py --query "test"
If results return, setup is complete.

Directory Rules

Path	Write	Read	Conflict Risk
`raw/`	User only	LLM + User	None
`wiki/`	LLM only	LLM + User	None
`index.md`, `log.md`	LLM only	LLM + User	None
`WIKI_SCHEMA.md`	User + LLM	LLM	Possible
`README.md`	User + LLM	User	Possible

Absolute rules:

Never modify files in raw/.
Wiki page creation/modification must follow the rules in this schema.

Conflict prevention for shared-write files: Both the user and the LLM may edit WIKI_SCHEMA.md and README.md. Do not edit these files while the LLM is working. If both sides have uncommitted changes, the LLM will run git pull --rebase and ask the user to resolve any conflicts manually.

Category Classification

wiki/entities/ — Things with a proper name (people, tools, companies, models). Examples: "GPT-4", "OpenAI", "Yoshua Bengio"
wiki/concepts/ — Abstract concepts without a proper name (attention mechanism, fine-tuning). Examples: "Transformer Architecture", "Reinforcement Learning"
wiki/sources/ — One source = one summary page
wiki/synthesis/ — Analysis combining 2+ sources or concepts

Raw Source Subdirectories

raw/myself/ — Your own content (blog posts, resume, etc.)
raw/articles/ — Web articles
raw/papers/ — Academic papers, PDFs
raw/videos/ — YouTube/podcast transcripts
raw/books/ — Book chapters
raw/misc/ — Miscellaneous
raw/assets/ — Images, attachments (not subject to ingest)

Boundary rule: If it has a proper name, it is an entity; if it is a general concept or method, it is a concept. In ambiguous cases (e.g., "Transformer" — both a paper title and an architecture), prefer concept unless the page is specifically about a particular paper or product.

Page Format

Frontmatter

Parser limitation: The parse_frontmatter function in the search tool is regex-based. A line starting with --- inside a YAML block scalar may be misidentified as the closing delimiter. This rarely occurs in practice with wiki pages.

---
title: "Page Title"
type: concept          # entity | concept | source | synthesis
tags: [tag1, tag2]
sources: ["raw/papers/example.md"]
aliases: []            # Optional — alternative names, e.g., ["attention mechanism"]
created: YYYY-MM-DD
updated: YYYY-MM-DD
---

Body Rules

Use Obsidian wikilinks: [[Page Name]]
Filenames: Always use English kebab-case (e.g., attention-mechanism.md), regardless of body language. Non-English concept names go in the aliases frontmatter field.
Filenames must be unique across all wiki/ subdirectories (compatible with Obsidian "shortest path" wikilinks).

Language Rules

Source summary pages (wiki/sources/): Written in the source's original language.
Entity/concept pages (wiki/entities/, wiki/concepts/): Written in the language of the source that first created the page. Later sources in different languages add information in the existing page's language.
Synthesis pages (wiki/synthesis/): Written in the language the user requests, or the dominant language of the combined sources.
Meta files (index.md, log.md, WIKI_SCHEMA.md, README.md): Written in the language specified by wiki_language in the configuration block above. Source titles in log entries are kept in their original language.
Wikilink names: Use one canonical name per concept (always [[Attention Mechanism]], never localized variants like [[어텐션 메커니즘]]). Add localized names to the frontmatter aliases field if needed.

Raw Source Format

Raw source files are freeform markdown. There is no required frontmatter — this schema does not enforce a format on raw sources. However, sources may include optional metadata at the top for context:

---
title: "Source Title"
author: "Author Name"
source: "https://original-url"
date: YYYY-MM-DD
---

The body is the original content or a transcript/summary of it. The LLM reads raw sources as input and generates structured wiki pages from them.

Workflows

Search Command Resolution

When a workflow references {search.query}, {search.add}, or {search.reindex}, resolve it by reading the commands field from whichever extension is active with provides: search-backend. If no search extension is active, use the default shown in parentheses after each token. If no default is applicable, skip the step and warn the user.

Ingest (Source Processing)

Trigger: User specifies a file, e.g., "ingest raw/articles/some-article.md"

Read the entire source (for sources with images: read text first, then examine referenced images separately).
Discuss key takeaways with the user (what to emphasize, perspective).
Create a source summary page at wiki/sources/<source-name>.md.
Update related entity/concept/synthesis pages (or create new ones).
Add cross-reference wikilinks between new and existing pages.
Add new page entries to index.md.
Update the search index for each new/modified wiki page: Run {search.add} <wiki page path> (Default: venv/bin/python tools/search.py --add) Pages without frontmatter are skipped with a warning; missing files cause exit code 1.
Append to log.md: ## [YYYY-MM-DD] ingest | <source title>
Git commit: ingest: <source title>

Query (Question Answering)

Trigger: User asks a question about wiki content.

Run {search.query} "<question>" --top 5 (Default: venv/bin/python tools/search.py --query)
- Output format: <path> [score: X.XX] (score: cosine similarity, -1.0 to 1.0)
- If output is empty or the highest score is below 0.5, also scan index.md as a fallback and merge results.
- If the index path (default: search-index/) does not exist (exit code 2), fall back to scanning index.md and advise the user to run the Reindex workflow.
- An empty query string causes exit code 1 — ensure the query is non-empty.
Read the wiki pages at the returned paths.
Synthesize an answer with source citations. The answer format may vary depending on the question — markdown pages, comparison tables, slide decks (Marp), charts (matplotlib), canvas files.
Save valuable answers back to the wiki. Comparisons, analyses, discovered connections — these should not vanish in chat history. Save as wiki/synthesis/<topic>.md. If unsure whether to save, ask the user.
(If saved) Update index.md, append to log.md: ## [YYYY-MM-DD] query | <question summary>, Git commit: query: <question summary>

Lint (Wiki Maintenance)

Trigger: User asks to check the wiki, or periodically.

Health checks:

Orphan pages — Pages with no inbound links
Broken links — References to [[Non-existent Page]]
Stale information — Content contradicting recent sources
Missing pages — Frequently mentioned entities/concepts without their own page
Insufficient cross-references — Highly related pages with no links between them

Growth suggestions (proactively, beyond just fixing problems): 6. Data gaps — Information that could be filled by web searches or new sources 7. New questions to investigate — Questions that would deepen wiki coverage 8. New sources to find — Source recommendations to fill identified gaps

Report findings, fix health issues with user approval, present growth suggestions.
Update the search index for each modified page: Run {search.add} <modified wiki page path> (Default: venv/bin/python tools/search.py --add)
Append to log.md: ## [YYYY-MM-DD] lint | <summary>
Git commit: lint: <fix summary>

Check-New (Batch New Source Detection)

Trigger: User asks to check for new sources and ingest them.

Read log.md to build a list of already-processed sources (find ## [YYYY-MM-DD] ingest | <title> headers, extract source file paths from ^- Source: <path> patterns in entry bodies).
Scan all files in raw/ subdirectories (excluding raw/assets/).
Difference = unprocessed sources.
Report the list of new sources, then proceed to ingest all of them without waiting for approval.
Run the full Ingest workflow for each source, with individual ingest: commits per source.
After all processing, append a summary to log.md: ## [YYYY-MM-DD] check-new | N new sources processed
Git commit: check-new: N sources processed

Reindex (Search Index Rebuild)

Trigger: User asks to rebuild the index, or during environment setup.

Run {search.reindex} (Default: venv/bin/python tools/search.py --reindex) For non-standard paths, add --index-path <path> and/or --wiki-path <path>.
Confirm the completion message and report (output: Reindex complete: N pages indexed, M skipped).
search-index/ is a generated artifact included in .gitignore — no commit needed.

index.md Rules

A categorized wiki catalog. Updated after every Ingest, Query save, and Lint.

One entry per line: - [[Page Name]] — one-line summary
Alphabetical order within each category
Category headers (in the language specified by wiki_language): Entities, Concepts, Sources, Synthesis

log.md Rules

Chronological, append-only record. Parseable with grep "^## \[" log.md | tail -5.

Header format: ## [YYYY-MM-DD] <action> | <title>
Actions: init, ingest, query, lint, check-new
Source file paths always use the - Source: <path> prefix (the Check-New workflow parses already-processed sources using the ^- Source: pattern).
Entry bodies also record pages created/modified.

Git Commit Conventions

init: project bootstrapped — Initial bootstrap
ingest: <source title> — Source processing
query: <question summary> — Query result saved to wiki
lint: <fix summary> — Wiki maintenance
check-new: <N sources processed> — Batch new source processing summary (after individual ingest commits)
schema: <change description> — WIKI_SCHEMA.md or README.md changes

Git Workflow

At the start of any workflow: git pull
After editing files: git add -> git commit -> git pull --rebase -> git push
On rebase conflict (extremely rare): git rebase --abort and ask the user to resolve manually.

Extensions

Before starting any workflow, scan the extensions/ directory. Each .md file is an extension module. Read all active extensions and follow their instructions alongside this core schema.

Loading Rules

Read all .md files in extensions/ (excluding README.md).
- If an extension declares min_schema_version higher than this schema's schema_version, skip it and warn the user to update.
Check provides: fields for conflicts:
- provides: signals exclusive ownership of a capability (e.g., only one search-backend can be active).
- If two extensions declare the same provides: value, only use the one with an overrides: field targeting the other.
- If neither overrides the other, warn the user and use the first one alphabetically.
- Extensions that augment (not replace) a workflow do NOT need a provides: value — they are always active.
Check requires.provides: fields — if an extension declares a capability dependency (e.g., requires.provides: [search-backend]), verify that a provider is active. Warn the user if not.
Verify scripts: check each entry in requires.scripts: exists in the repo. Warn the user if any are missing.
Install dependencies: run venv/bin/pip install <package> for each entry in requires.packages: (let pip handle version resolution).
Follow each active extension's instructions.

What Extensions Can Do

Add new workflows
Append steps to existing core workflows (reference the step they follow, e.g., "After Ingest step 3, also do X")
Replace a capability by declaring provides: + overrides:
Add integrations (Obsidian, Notion, etc.)
Define new page types or categories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seojae (서재) — Wiki Schema

Project Overview

Prerequisites

Setup

1. Update your tool's rule file

2. Set up Python environment

3. Build search index

4. Verify

Directory Rules

Category Classification

Raw Source Subdirectories

Page Format

Frontmatter

Body Rules

Language Rules

Raw Source Format

Workflows

Search Command Resolution

Ingest (Source Processing)

Query (Question Answering)

Lint (Wiki Maintenance)

Check-New (Batch New Source Detection)

Reindex (Search Index Rebuild)

index.md Rules

log.md Rules

Git Commit Conventions

Git Workflow

Extensions

Loading Rules

What Extensions Can Do

FilesExpand file tree

WIKI_SCHEMA.md

Latest commit

History

WIKI_SCHEMA.md

File metadata and controls

Seojae (서재) — Wiki Schema

Project Overview

Prerequisites

Setup

1. Update your tool's rule file

2. Set up Python environment

3. Build search index

4. Verify

Directory Rules

Category Classification

Raw Source Subdirectories

Page Format

Frontmatter

Body Rules

Language Rules

Raw Source Format

Workflows

Search Command Resolution

Ingest (Source Processing)

Query (Question Answering)

Lint (Wiki Maintenance)

Check-New (Batch New Source Detection)

Reindex (Search Index Rebuild)

index.md Rules

log.md Rules

Git Commit Conventions

Git Workflow

Extensions

Loading Rules

What Extensions Can Do