Skip to content

Commit 7c71999

Browse files
simon_sunclaude
andcommitted
v0.1.0: open-source release
Schema v2 with alias indexing, tag filtering, and fts_sources table. Four-channel RRF search (BM25 + vector + indegree + title/alias). Frontmatter parsing for tags and aliases. Tool consolidation 8→6. CLI mode (search, index, status). Chunk overlap support. SSE security hardened (default 127.0.0.1). CI with SHA-pinned actions. 217 tests. Python 3.11+. MIT license. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 67f900b commit 7c71999

20 files changed

Lines changed: 2170 additions & 306 deletions

.github/workflows/test.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: Tests
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ["3.11", "3.13"]
15+
steps:
16+
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
17+
- uses: astral-sh/setup-uv@e4db8464a088ece1b920f60402e813ea4de65b8f # v4
18+
- name: Set up Python ${{ matrix.python-version }}
19+
run: uv python install ${{ matrix.python-version }}
20+
- name: Install dependencies
21+
run: uv sync --dev
22+
- name: Run tests
23+
run: uv run python -m pytest tests/ -q --ignore=tests/test_integration.py
24+
- name: Verify install
25+
run: uv tool install . && seeklink status --vault /tmp || true

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,6 @@ wheels/
2020

2121
# Environment
2222
.env
23+
24+
# Security reports (local only)
25+
.gstack/

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 Siyuan Sun
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 143 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,169 @@
11
# SeekLink
22

3-
Hybrid semantic search and link discovery MCP server for Obsidian vaults. Fully local, no API keys needed.
3+
**Let your AI agent manage your Zettelkasten.**
44

5-
## Features
5+
SeekLink is an MCP server that gives AI assistants (Claude Code, Cursor, etc.) deep access to your markdown vault. It searches, discovers missing connections, and writes `[[wikilinks]]` for you — so your knowledge graph grows as you work.
66

7-
- **Hybrid search**: BM25 full-text + vector semantic search with RRF fusion
8-
- **Link discovery**: Finds related notes and writes `[[links]]` on approval
9-
- **Knowledge graph**: Parses `[[wikilinks]]`, tracks indegree, BFS graph traversal
10-
- **Bilingual**: Native Chinese + English support (jieba tokenizer + jina-embeddings-v2-base-zh)
11-
- **Auto-indexing**: File watcher detects changes and re-indexes automatically
7+
Built for people who take notes seriously and want an AI that understands their knowledge structure, not just their text.
128

13-
## Setup
9+
## What it does
10+
11+
```
12+
You: "What do I know about MCP protocol?"
13+
Agent: searches vault → finds 8 related notes across topics
14+
15+
You: "What should this note link to?"
16+
Agent: analyzes content → suggests 4 missing connections with relevance scores
17+
18+
You: "Approve the first two"
19+
Agent: writes [[wikilinks]] directly into your note file
20+
```
21+
22+
**Six MCP tools:**
23+
24+
| Tool | What it does |
25+
|------|-------------|
26+
| `search` | Four-channel hybrid search: keyword (BM25) + semantic (vector) + knowledge graph (indegree) + title/alias. Fused with Reciprocal Rank Fusion. Filter by tags or folder. |
27+
| `graph` | Explore a note's neighborhood — outgoing links, backlinks, configurable depth |
28+
| `suggest_links` | Find notes that should be linked but aren't. Returns scored suggestions |
29+
| `resolve_suggestion` | Approve (writes `[[link]]` to file) or reject a link suggestion |
30+
| `index` | Index a note, or list unprocessed notes |
31+
| `status` | Vault stats: indexed notes, graph size, watcher status |
32+
33+
## Why SeekLink
34+
35+
**Most MCP servers for Obsidian are file managers.** They read, write, and search text. SeekLink understands your knowledge *structure*: it parses `[[wikilinks]]`, builds a knowledge graph, tracks which notes are central (indegree), and finds connections you missed.
36+
37+
**Chinese is a first-class citizen.** jieba tokenization for keyword search + jina-embeddings-v2-base-zh for semantic search. Not "also supports Chinese" — designed for it. Bilingual vaults (Chinese + English) work out of the box.
38+
39+
**Fully local, headless.** Runs on your machine. No Obsidian plugins required, no API keys for search. Works from the terminal with Claude Code, or as MCP server for any client.
40+
41+
## Install
1442

1543
```bash
16-
uv sync
44+
uv tool install seeklink
45+
# or
46+
pip install seeklink
1747
```
1848

19-
## Usage
49+
## Setup
50+
51+
### MCP server (for Claude Code, Cursor, etc.)
2052

21-
SeekLink runs as an MCP server — configure it in `.mcp.json`:
53+
Add to your MCP config:
2254

2355
```json
2456
{
2557
"mcpServers": {
2658
"seeklink": {
27-
"command": "uv",
28-
"args": ["run", "python", "-m", "seeklink"],
59+
"command": "seeklink",
60+
"args": ["serve"],
2961
"env": { "SEEKLINK_VAULT": "/path/to/your/vault" }
3062
}
3163
}
3264
}
3365
```
3466

35-
Then use the 8 tools through any MCP client (Claude Code, etc.):
67+
First run indexes your vault automatically. A file watcher keeps the index up to date.
3668

37-
| Tool | Description |
38-
|------|-------------|
39-
| `search` | Hybrid BM25 + vector search with optional graph expansion |
40-
| `suggest_links` | Find notes worth linking to |
41-
| `approve_suggestion` | Accept a link suggestion (writes `[[link]]` to file) |
42-
| `reject_suggestion` | Dismiss a link suggestion |
43-
| `graph_neighbors` | Show link neighborhood (BFS) |
44-
| `index_note` | Index a note (chunk, embed, parse links) |
45-
| `get_unprocessed` | List notes needing indexing |
46-
| `status` | Index stats, graph size, watcher status |
69+
### CLI
4770

48-
## Tests
71+
```bash
72+
seeklink search "machine learning" --vault /path/to/vault
73+
seeklink search "知识管理" --vault /path/to/vault --tags ai --top-k 5
74+
seeklink index --vault /path/to/vault
75+
seeklink status --vault /path/to/vault
76+
```
77+
78+
## How search works
79+
80+
SeekLink runs four search channels in parallel and merges results with Reciprocal Rank Fusion:
81+
82+
```
83+
Query: "agent memory systems"
84+
85+
├── BM25 (FTS5 + jieba) ──── keyword match ──────── weight 1.0
86+
├── Vector (jina-v2-zh) ──── semantic similarity ── weight 1.0
87+
├── Indegree ─────────────── well-linked = quality ─ weight 0.3
88+
└── Title/Alias (FTS5) ──── exact name match ────── weight 3.0
89+
90+
└── RRF Fusion → ranked results
91+
```
92+
93+
- **Tags filter:** `search("query", tags=["ai", "mcp"])` — only return notes with these tags
94+
- **Folder filter:** `search("query", folder="notes/")` — only search within a folder
95+
- **Expand mode:** `search("query", expand=True)` — include graph neighbors of results
96+
97+
## Frontmatter
98+
99+
SeekLink reads `tags` and `aliases` from YAML frontmatter:
100+
101+
```yaml
102+
---
103+
tags: [ai, machine-learning]
104+
aliases: [ML, Machine Learning]
105+
---
106+
```
107+
108+
Both inline (`[a, b]`) and block list formats supported. Aliases are searchable and used for link resolution — if a note has `aliases: [ML]`, then `[[ML]]` resolves to it.
109+
110+
## Architecture
111+
112+
```
113+
┌────────────────────────────────┐
114+
│ MCP Client │
115+
│ (Claude Code, Cursor, ...) │
116+
└──────────┬─────────────────────┘
117+
│ stdio / SSE
118+
┌──────────▼─────────────────────┐
119+
│ FastMCP Server │
120+
│ 6 tools, async handlers │
121+
└──────────┬─────────────────────┘
122+
123+
┌────────────────────┼────────────────────┐
124+
▼ ▼ ▼
125+
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
126+
│ Search │ │ Ingest │ │ Watcher │
127+
│ 4-ch RRF │ │ chunk+embed │ │ watchfiles │
128+
└──────┬──────┘ │ +frontmatter│ │ auto-index │
129+
│ └──────┬───────┘ └──────────────┘
130+
▼ ▼
131+
┌──────────────────────────────────┐
132+
│ SQLite + Extensions │
133+
│ FTS5 (jieba) │ vec0 (768d) │
134+
│ sources, chunks, wiki_links │
135+
│ source_tags, fts_sources │
136+
└──────────────────────────────────┘
137+
```
138+
139+
## Configuration
140+
141+
| Variable | Default | Description |
142+
|----------|---------|-------------|
143+
| `SEEKLINK_VAULT` | `.` | Path to vault root |
144+
| `SEEKLINK_SSE_HOST` | `127.0.0.1` | SSE server bind address |
145+
| `SEEKLINK_SSE_PORT` | `8767` | SSE server port |
146+
147+
## Development
49148

50149
```bash
51-
uv run python -m pytest tests/ -v
150+
git clone https://github.com/simonsysun/seeklink
151+
cd seeklink
152+
uv sync --dev
153+
uv run python -m pytest tests/ -q --ignore=tests/test_integration.py
52154
```
155+
156+
217 tests. Python 3.11+.
157+
158+
## Roadmap
159+
160+
- [ ] Graph intelligence: orphan detection, cluster analysis, bridge notes, knowledge gap discovery
161+
- [ ] Cross-encoder reranking for top-k results
162+
- [ ] Lightweight embedding model option (~117MB vs 330MB default)
163+
- [ ] PyPI automated publishing
164+
165+
See [TODOS.md](TODOS.md) for details.
166+
167+
## License
168+
169+
MIT

TODOS.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# TODOs
2+
3+
Deferred work for future releases. Contributions welcome.
4+
5+
## v0.2 candidates
6+
7+
### Cross-encoder reranking
8+
Re-rank top results with a cross-encoder for better precision. Needs latency benchmarking to ensure the quality gain justifies the added latency (~100-300ms per query).
9+
10+
**Context:** The current RRF fusion (BM25 + vector + indegree + title) is already strong. Cross-encoder would be an optional second pass on the top-k results. Competitor "Hybrid Search" uses multilingual-e5 reranking.
11+
12+
### Lightweight embedding model option
13+
Add `SEEKLINK_MODEL` environment variable to choose between:
14+
- `jinaai/jina-embeddings-v2-base-zh` (default, 330MB, best CJK)
15+
- A smaller multilingual model (~117MB, 100+ languages)
16+
17+
**Context:** Deferred because adding model choice increases decision burden for new users. 330MB is acceptable in 2026. Add when community requests it.
18+
19+
## Infrastructure
20+
21+
### PyPI automated publishing
22+
GitHub Actions workflow to publish to PyPI on tag push. Currently v0.1.0 is published manually.
23+
24+
### Integration test fixture cleanup
25+
Session-scoped async MCP client fixture hangs during teardown when all integration tests run together. Individual test classes pass fine. Likely a `create_connected_server_and_client_session` cleanup issue with the watcher task.

pyproject.toml

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,23 @@
11
[project]
22
name = "seeklink"
33
version = "0.1.0"
4-
description = "Hybrid semantic search MCP server for Obsidian vaults. BM25 + vector + knowledge graph, native CJK support, fully local."
4+
description = "Hybrid semantic search MCP server for markdown vaults. Four-channel RRF fusion (BM25 + vector + knowledge graph + title), native CJK support, fully local."
55
readme = "README.md"
6-
requires-python = ">=3.14"
6+
license = "MIT"
7+
requires-python = ">=3.11"
8+
authors = [
9+
{ name = "Siyuan Sun" },
10+
]
11+
classifiers = [
12+
"Development Status :: 4 - Beta",
13+
"License :: OSI Approved :: MIT License",
14+
"Programming Language :: Python :: 3",
15+
"Programming Language :: Python :: 3.11",
16+
"Programming Language :: Python :: 3.12",
17+
"Programming Language :: Python :: 3.13",
18+
"Programming Language :: Python :: 3.14",
19+
"Topic :: Text Processing :: Indexing",
20+
]
721
dependencies = [
822
"fastembed>=0.7.4",
923
"jieba>=0.42.1",
@@ -13,6 +27,13 @@ dependencies = [
1327
"watchfiles>=1.1.1",
1428
]
1529

30+
[project.urls]
31+
Homepage = "https://github.com/simonsysun/seeklink"
32+
Repository = "https://github.com/simonsysun/seeklink"
33+
34+
[project.scripts]
35+
seeklink = "seeklink.__main__:main"
36+
1637
[dependency-groups]
1738
dev = [
1839
"anyio>=4.12.1",

0 commit comments

Comments
 (0)