Skip to content

Commit 7685d71

Browse files
committed
fix(release): harden v0.5 candidate contracts
1 parent deec08a commit 7685d71

14 files changed

Lines changed: 206 additions & 70 deletions

.github/workflows/publish.yml

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,9 @@ on:
1313
workflow_dispatch:
1414
inputs:
1515
tag:
16-
description: "Existing tag to publish (e.g. v0.2.1). Leave blank to use the branch HEAD."
17-
required: false
16+
description: "Existing tag to publish (e.g. v0.5.0). Required for manual publish."
17+
required: true
1818
type: string
19-
default: ""
2019

2120
jobs:
2221
build:
@@ -28,28 +27,47 @@ jobs:
2827
with:
2928
# workflow_dispatch may override with a specific tag; otherwise
3029
# use the ref that triggered the workflow (tag push = that tag).
31-
ref: ${{ github.event.inputs.tag != '' && github.event.inputs.tag || github.ref }}
30+
ref: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.tag || github.ref }}
3231

3332
- name: Set up uv
3433
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
3534

35+
- name: Install dependencies
36+
run: uv sync --dev
37+
38+
- name: Run tests
39+
run: uv run python -m pytest tests/ -q
40+
3641
- name: Build distributions
3742
run: uv build
3843

3944
- name: Sanity-check version matches tag
40-
# When triggered by a tag push, ensure pyproject.toml version matches
41-
# the tag name. Skipped for workflow_dispatch (where we assume the
42-
# invoker knows what they're doing).
43-
if: github.event_name == 'push'
45+
# Ensure pyproject.toml version matches the tag name for both automatic
46+
# tag publishes and manual retries.
4447
run: |
45-
TAG_VERSION="${GITHUB_REF_NAME#v}"
48+
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
49+
TAG_NAME="${{ github.event.inputs.tag }}"
50+
else
51+
TAG_NAME="${GITHUB_REF_NAME}"
52+
fi
53+
case "$TAG_NAME" in
54+
v*) ;;
55+
*)
56+
echo "::error::Release tag must start with v (got ${TAG_NAME})"
57+
exit 1
58+
;;
59+
esac
60+
TAG_VERSION="${TAG_NAME#v}"
4661
PKG_VERSION=$(grep -E '^version = ' pyproject.toml | head -1 | sed -E 's/version = "([^"]+)"/\1/')
4762
echo "Tag: v${TAG_VERSION} pyproject: ${PKG_VERSION}"
4863
if [ "$TAG_VERSION" != "$PKG_VERSION" ]; then
4964
echo "::error::Tag v${TAG_VERSION} does not match pyproject version ${PKG_VERSION}"
5065
exit 1
5166
fi
5267
68+
- name: Check distributions
69+
run: uvx twine check dist/*
70+
5371
- name: Upload artifacts
5472
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
5573
with:

CHANGELOG.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010
## [0.5.0] - 2026-05-04
1111

1212
### Changed
13+
- Full-vault `seeklink index` now prints progress to stderr and keeps the final
14+
`Done:` summary on stdout, including the `SEEKLINK_VAULT` daily-use path.
1315
- Full-vault indexing now embeds in smaller batches to reduce long-tail embedding stalls on real Markdown vaults.
14-
- `seeklink index --vault PATH` now prints full-vault progress to stderr while keeping the final `Done:` summary on stdout.
1516
- Indexes now record the embedder, vector dimension, distance metric, and
1617
chunker version used to build their vectors; full-vault indexing rebuilds
1718
derived index contents when that configuration changes.
@@ -20,6 +21,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2021
changing the default 768-dimensional model.
2122

2223
### Fixed
24+
- Full-vault indexing no longer hard-skips `todo/` or `archive/` directories;
25+
those are common PKM folders and should be indexed unless hidden or removed.
2326
- Chinese question-style queries now strip common question particles before
2427
FTS5 matching, so terms like `卵生动物有哪些?` can use the BM25 channel
2528
instead of falling back to vector-only retrieval.
@@ -37,18 +40,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3740
- `seeklink search` now refuses to query an existing vector index whose stored
3841
embedder/chunker metadata does not match the active configuration, instead of
3942
silently mixing query vectors with incompatible document vectors.
43+
- Reranker scoring now uses a numerically stable two-class softmax, avoiding
44+
overflow on extreme model logits.
4045

4146
### Dev
47+
- Apple Silicon MLX reranking is now exposed as an optional `seeklink[mlx]`
48+
extra, while the base install remains usable without MLX.
49+
- `numpy` is now declared as a direct runtime dependency because SeekLink
50+
imports it directly.
51+
- The PyPI publish workflow now runs the test suite, checks the built
52+
distributions, and validates manually triggered release tags before
53+
publishing.
4254
- Blind-test result JSON now includes per-query `failure_bucket` labels and
4355
aggregate bucket counts, making it easier to distinguish candidate-generation,
4456
rerank-budget, and reranker-ordering failures during search-quality work.
4557
- Source checkouts now declare a build backend, so `uv sync --dev` installs the
4658
working tree's `seeklink` console script instead of falling through to a stale
4759
globally installed command during local verification.
4860
- Refreshed `tests/blind/results/` with v0.5 release-quality snapshots only. On
49-
the bundled 22-query fixture, config A reports mean Recall@10 0.985, MRR
50-
0.977, and nDCG@10 0.901; latency measurements remain in the JSON result
51-
file because they are hardware- and load-dependent.
61+
the bundled 22-query fixture with the optional MLX reranker active, config A
62+
reports mean Recall@10 0.985, MRR 0.977, and nDCG@10 0.901; latency
63+
measurements remain in the JSON result file because they are hardware- and
64+
load-dependent.
5265

5366
## [0.4.0] - 2026-04-29
5467

README.md

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,18 @@ uv tool install seeklink
3030
pip install seeklink
3131
```
3232

33+
For Apple Silicon reranking support, install the optional MLX extra:
34+
35+
```bash
36+
uv tool install "seeklink[mlx]"
37+
# or
38+
pip install "seeklink[mlx]"
39+
```
40+
41+
SeekLink requires Python's `sqlite3` module to be linked against SQLite
42+
3.45 or newer with FTS5 enabled. `seeklink status --vault PATH` checks this and
43+
prints a clear error if the runtime SQLite is too old.
44+
3345
## Quick Start
3446

3547
```bash
@@ -49,11 +61,13 @@ seeklink search "agent memory systems"
4961
seeklink get notes/agent-memory-patterns.md:1 -C 20
5062
```
5163

52-
`seeklink search` and `seeklink index` auto-use a resident daemon when
53-
`SEEKLINK_VAULT` is set and `--vault` is not passed. The daemon keeps the
54-
embedder and optional reranker in memory. `seeklink status` and `seeklink get`
55-
always stay cold-start: status only reads SQLite metadata, and get reads the
56-
file directly from disk.
64+
`seeklink search` and single-file `seeklink index path/to/file.md` auto-use a
65+
resident daemon when `SEEKLINK_VAULT` is set and `--vault` is not passed. The
66+
daemon keeps the embedder and optional reranker in memory. Full-vault
67+
`seeklink index` runs in-process so progress stays on stderr and the final
68+
`Done:` summary stays on stdout. `seeklink status` and `seeklink get` always
69+
stay cold-start: status only reads SQLite metadata, and get reads the file
70+
directly from disk.
5771

5872
## Output
5973

@@ -142,10 +156,11 @@ configuration is compatible.
142156
seeklink daemon --vault PATH
143157
```
144158

145-
You normally do not run this directly. `search` and `index` auto-spawn and
146-
auto-restart the daemon when appropriate. Passing `--vault` to `search` or
147-
`index` forces a one-shot cold-start path because the daemon is bound to one
148-
vault at startup.
159+
You normally do not run this directly. `search` and single-file `index`
160+
auto-spawn and auto-restart the daemon when appropriate. Full-vault `index`
161+
still runs in-process for progress output. Passing `--vault` to `search` or
162+
single-file `index` forces a one-shot cold-start path because the daemon is
163+
bound to one vault at startup.
149164

150165
## How Search Works
151166

@@ -168,9 +183,10 @@ set `SEEKLINK_EMBEDDING_DIM`, but it must match the embedder output and requires
168183
a full `seeklink index` rebuild.
169184

170185
On Apple Silicon, SeekLink can rerank candidates with
171-
`mlx-community/Qwen3-Reranker-0.6B-mxfp8`. Reranking is local and optional. Use
172-
`--no-rerank` for one query or set `SEEKLINK_RERANKER_MODEL=""` to disable it
173-
globally.
186+
`mlx-community/Qwen3-Reranker-0.6B-mxfp8` when installed with `seeklink[mlx]`.
187+
Reranking is local and optional; if MLX is unavailable, SeekLink falls back to
188+
first-stage hybrid RRF ranking. Use `--no-rerank` for one query or set
189+
`SEEKLINK_RERANKER_MODEL=""` to disable it globally.
174190

175191
## Frontmatter
176192

@@ -203,12 +219,13 @@ and a wikilink graph. Delete `.seeklink/` and run `seeklink index` to rebuild.
203219
| Area | Status |
204220
|---|---|
205221
| Python | 3.11, 3.12, 3.13, 3.14 |
222+
| SQLite | Python `sqlite3` linked against SQLite 3.45+ with FTS5 |
206223
| OS | macOS and Linux |
207224
| Windows | Not supported as a first-class path |
208225
| File format | Markdown `.md` |
209226
| Vault style | Plain folder or Obsidian-compatible vault |
210227
| CJK | Native path via jieba, with trigram fallback on static SQLite builds |
211-
| Reranker | Apple Silicon via MLX; disabled elsewhere |
228+
| Reranker | Optional `seeklink[mlx]` extra on Apple Silicon; disabled elsewhere |
212229
| Daemon | Single vault per machine |
213230

214231
## Not For

README.zh.md

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,17 @@ uv tool install seeklink
2626
pip install seeklink
2727
```
2828

29+
如果要在 Apple Silicon 上启用本地 MLX reranker,请安装可选 extra:
30+
31+
```bash
32+
uv tool install "seeklink[mlx]"
33+
# 或者
34+
pip install "seeklink[mlx]"
35+
```
36+
37+
SeekLink 需要 Python 的 `sqlite3` 模块链接到 SQLite 3.45 或更新版本,并启用 FTS5。
38+
`seeklink status --vault PATH` 会检查这个运行时条件;如果 SQLite 太旧,会给出明确错误。
39+
2940
## 快速开始
3041

3142
```bash
@@ -45,10 +56,12 @@ seeklink search "agent 记忆系统"
4556
seeklink get notes/agent-memory-patterns.md:1 -C 20
4657
```
4758

48-
当设置了 `SEEKLINK_VAULT` 且不传 `--vault` 时,`seeklink search``seeklink index`
49-
会自动使用一个常驻守护进程(daemon),它把嵌入模型和可选的 reranker 保持在内存里,
50-
避免每次调用都重新加载。`seeklink status``seeklink get` 始终走冷启动路径:
51-
status 只读 SQLite 元数据,get 直接从磁盘读文件。
59+
当设置了 `SEEKLINK_VAULT` 且不传 `--vault` 时,`seeklink search` 和单文件
60+
`seeklink index path/to/file.md` 会自动使用一个常驻守护进程(daemon),它把嵌入模型
61+
和可选的 reranker 保持在内存里,避免每次调用都重新加载。全库 `seeklink index`
62+
会在 CLI 进程内运行,这样进度可以稳定输出到 stderr,最终 `Done:` 摘要保留在 stdout。
63+
`seeklink status``seeklink get` 始终走冷启动路径:status 只读 SQLite 元数据,
64+
get 直接从磁盘读文件。
5265

5366
## 输出格式
5467

@@ -132,8 +145,9 @@ chunker 配置生成的,SeekLink 会重建派生索引内容。单文件索引
132145
seeklink daemon --vault PATH
133146
```
134147

135-
通常不需要手动运行。`search``index` 在合适的时候会自动启动和重启守护进程。
136-
`search``index``--vault` 会强制走一次性冷启动路径,因为守护进程在启动时就绑定到了一个笔记库。
148+
通常不需要手动运行。`search` 和单文件 `index` 在合适的时候会自动启动和重启守护进程。
149+
全库 `index` 仍然在 CLI 进程内运行,以便输出进度。给 `search` 或单文件 `index`
150+
`--vault` 会强制走一次性冷启动路径,因为守护进程在启动时就绑定到了一个笔记库。
137151

138152
## 搜索原理
139153

@@ -154,9 +168,10 @@ trigram 分词器,而不是崩溃。
154168
默认向量维度是 768。高级自定义 embedder 实验可以设置 `SEEKLINK_EMBEDDING_DIM`
155169
但它必须和 embedder 的实际输出一致,并且需要重新运行一次完整的 `seeklink index`
156170

157-
在 Apple Silicon 上,SeekLink 可以用 `mlx-community/Qwen3-Reranker-0.6B-mxfp8`
158-
对候选结果进行重排序。Reranking 是本地且可选的——用 `--no-rerank` 跳过单次查询,
159-
或设置 `SEEKLINK_RERANKER_MODEL=""` 全局禁用。
171+
在 Apple Silicon 上,如果安装了 `seeklink[mlx]`,SeekLink 可以用
172+
`mlx-community/Qwen3-Reranker-0.6B-mxfp8` 对候选结果进行重排序。Reranking 是本地且
173+
可选的;如果 MLX 不可用,SeekLink 会回退到第一阶段的混合 RRF 排名。用
174+
`--no-rerank` 可以跳过单次查询,或设置 `SEEKLINK_RERANKER_MODEL=""` 全局禁用。
160175

161176
## Frontmatter
162177

@@ -188,12 +203,13 @@ SeekLink 在笔记库内写入一个 SQLite 数据库:
188203
| 维度 | 状态 |
189204
|---|---|
190205
| Python | 3.11、3.12、3.13、3.14 |
206+
| SQLite | Python `sqlite3` 链接到 SQLite 3.45+,并启用 FTS5 |
191207
| 操作系统 | macOS 和 Linux |
192208
| Windows | 不作为一等路径支持 |
193209
| 文件格式 | Markdown `.md` |
194210
| 笔记库类型 | 普通文件夹或 Obsidian 兼容 vault |
195211
| 中文/CJK | jieba 路径,静态 SQLite 环境下自动降级为 trigram |
196-
| Reranker | Apple Silicon 上通过 MLX 可用;其他平台自动禁用 |
212+
| Reranker | Apple Silicon 上通过可选 `seeklink[mlx]` extra 启用;其他平台自动禁用 |
197213
| 守护进程 | 一台机器一个笔记库 |
198214

199215
## 不适用的场景

llms.txt

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@ Runs on macOS and Linux, Python 3.11+. Indexes any folder of `.md` files. Obsidi
2424
## Install
2525

2626
- PyPI: `pip install seeklink` or `uv tool install seeklink`.
27+
- Apple Silicon reranker: `pip install "seeklink[mlx]"` or `uv tool install "seeklink[mlx]"`.
2728
- Source: `git clone https://github.com/simonsysun/seeklink && cd seeklink && uv sync --dev`.
29+
- Runtime requirement: Python's `sqlite3` must link against SQLite >= 3.45 with FTS5. `seeklink status --vault PATH` checks this.
2830

2931
## Agent contract
3032

@@ -42,7 +44,7 @@ seeklink get PATH:LINE -l N # read window around a hit
4244
seeklink get PATH:LINE -C N # read N lines before/after a hit
4345
```
4446

45-
Set `SEEKLINK_VAULT=<path>` once to omit `--vault` on every call and route through the resident daemon (first call after boot: ~2s; warm: ~1-2s with reranker, ~10ms without). If that env/model config changes, `search` and `index` auto-restart a stale daemon instead of silently serving the old vault.
47+
Set `SEEKLINK_VAULT=<path>` once to omit `--vault` on repeated calls. `search` and single-file `index path/to/file.md` use the resident daemon; full-vault `index` runs in the CLI process so progress stays on stderr. First-ever model downloads can take much longer than normal daemon startup. Warm search latency depends on whether the optional MLX reranker is installed and active; `search --json` reports whether reranking was active for that query. If vault/model config changes, `search` and single-file `index` auto-restart a stale daemon instead of silently serving the old vault.
4648

4749
### Output contract
4850

@@ -80,5 +82,7 @@ No other codes.
8082
### Common failure modes
8183

8284
- Empty results on a fresh vault → index not built yet. Run `seeklink index --vault PATH`.
83-
- Daemon won't auto-spawn → `--vault` was passed, which intentionally forces cold-start. Without `--vault`, `search` / `index` should auto-spawn and auto-restart stale daemons when `vault` / `embedder` / `reranker` no longer match.
84-
- Line numbers look wrong → file was edited after indexing. Re-index. `status` prints a freshness warning on cold-start.
85+
- Reranker unavailable → install `seeklink[mlx]` on Apple Silicon or accept first-stage hybrid RRF ranking. Use `--no-rerank` when a deterministic no-rerank path is preferred.
86+
- SQLite capability error → use a Python build whose `sqlite3` module links against SQLite >= 3.45 with FTS5.
87+
- Daemon won't auto-spawn → `--vault` was passed, which intentionally forces cold-start. Without `--vault`, `search` and single-file `index` should auto-spawn and auto-restart stale daemons when `vault` / `embedder` / `reranker` no longer match.
88+
- Line numbers look wrong → file was edited after indexing. Re-index. `status` prints a freshness warning on cold-start; agents should run `status --json` after long editing sessions.

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,16 @@ classifiers = [
5151
dependencies = [
5252
"fastembed>=0.7.4",
5353
"jieba>=0.42.1",
54+
"numpy>=1.26",
5455
"sqlite-vec>=0.1.6",
5556
"sqlitefts>=1.0.1",
5657
]
5758

59+
[project.optional-dependencies]
60+
mlx = [
61+
"mlx-lm>=0.31.2; platform_system == 'Darwin' and platform_machine == 'arm64'",
62+
]
63+
5864
[project.urls]
5965
Homepage = "https://github.com/simonsysun/seeklink"
6066
Repository = "https://github.com/simonsysun/seeklink"

seeklink/__main__.py

Lines changed: 13 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@
33
Subcommands:
44
daemon — run the Unix-socket daemon (eager-loaded models, never exits)
55
search — search the vault (daemon-first; cold-start fallback)
6-
index — index notes (daemon-first; cold-start fallback)
6+
index — index notes (full-vault in-process; single-file daemon-first)
77
status — show vault / index stats (always cold-start; no model load)
88
get — print a line range of a vault file (direct filesystem read)
99
10-
Dispatch: when `--vault` is not passed to `search` / `index`, the CLI
11-
tries the daemon socket first (auto-spawning the daemon on first call)
10+
Dispatch: when `--vault` is not passed to `search` / single-file `index`,
11+
the CLI tries the daemon socket first (auto-spawning the daemon on first call)
1212
and falls back to an in-process cold-start if the daemon is unreachable.
1313
Passing `--vault` always uses cold-start because the daemon is bound to
1414
a single vault (selected via SEEKLINK_VAULT or cwd at daemon-start time).
15-
`status` and `get` never route through the daemon.
15+
Full-vault `index`, `status`, and `get` never route through the daemon.
1616
1717
Agents integrating SeekLink should invoke the CLI via `subprocess` or
1818
connect to the daemon socket via `seeklink.cli_client` for structured
@@ -508,31 +508,22 @@ def _cmd_search(args: argparse.Namespace) -> None:
508508
def _cmd_index(args: argparse.Namespace) -> None:
509509
_setup_logging()
510510

511-
if _should_use_daemon(args):
511+
if args.path and _should_use_daemon(args):
512512
daemon_args: dict = {}
513-
if args.path:
514-
daemon_args["path"] = args.path
513+
daemon_args["path"] = args.path
515514
resp = _try_daemon("index", daemon_args)
516515
if resp is not None:
517516
result = resp["result"]
518-
if args.path:
519-
# single-file index: {"path": "...", "status": "indexed"|"skipped"|...}
520-
status = result.get("status", "?")
521-
if status == "skipped":
522-
print(f"Skipped: {result.get('path', args.path)}")
523-
else:
524-
print(f"Indexed: {result.get('path', args.path)} ({status})")
517+
# single-file index: {"path": "...", "status": "indexed"|"skipped"|...}
518+
status = result.get("status", "?")
519+
if status == "skipped":
520+
print(f"Skipped: {result.get('path', args.path)}")
525521
else:
526-
stats = result
527-
print(
528-
f"Done: {stats.get('ingested', 0)} indexed, "
529-
f"{stats.get('unchanged', 0)} unchanged, "
530-
f"{stats.get('skipped', 0)} skipped, "
531-
f"{stats.get('errors', 0)} errors"
532-
)
522+
print(f"Indexed: {result.get('path', args.path)} ({status})")
533523
return
534524

535-
# Cold-start fallback
525+
# Cold-start fallback. Full-vault indexing always stays on this path so
526+
# progress can stream to stderr without expanding the daemon protocol.
536527
from seeklink.app import init_app
537528
from seeklink.ingest import ingest_file, ingest_vault
538529

0 commit comments

Comments
 (0)