You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
0.3.2: repository cleanup — strip internal R&D artifacts from public surface
This release contains no runtime code changes from 0.3.1. It removes
development-process artifacts that leaked into the public repo during
the 0.3 line's rapid iteration, so readers see a shipped tool instead
of a work log.
- Consolidated the 0.3.0 and 0.3.1 CHANGELOG entries (same code
described twice with process detail) into a single 0.3.2 entry.
- Trimmed tests/blind/results/ to the measurements a reader needs:
baseline (A_v0.2.2.json), shipping (A_v0.3.json), upper bound
(C_v0.2.2.json). Intermediate iteration results removed.
- Removed docs/v0.3-plan.md. It was internal design scratch; the
shipped design is documented in the README "How search works"
section and in code comments.
- Tightened code and test docstrings to describe current behavior,
not the exploration that led there.
- README metric claims labeled as "pilot" with explicit sample size.
- Removed residual cross-references to other projects and internal
tooling from public-facing docs.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+32-32Lines changed: 32 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,49 +7,49 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
-
## [0.3.1] - 2026-04-23
10
+
## [0.3.2] - 2026-04-23
11
11
12
-
Docs-only patch. No code changes from 0.3.0; this release exists purely
13
-
to refresh the README snapshot shown on the PyPI project page.
12
+
Repository cleanup pass. No code changes affecting runtime behavior
13
+
from 0.3.1; this release removes internal R&D artifacts from the public
14
+
surface so the repo reads as a shipped tool rather than a work log.
14
15
15
-
### Fixed (documentation)
16
-
- Quick-start ordering corrected: `seeklink index` must run before `seeklink search` (searching an un-indexed vault returns no results).
17
-
-`seeklink status` is always cold-start (direct SQLite + freshness read); removed incorrect mentions of it auto-spawning the daemon.
18
-
-`seeklink get` is a direct filesystem read; clarified it doesn't involve the daemon either.
19
-
- Latency figures separated by configuration: warm reranker-on path is ~1-2s per query (not ~0.5s); ~10ms applies only to the reranker-disabled path.
20
-
- CLI vs daemon output surfaces split: CLI prints `path:line_start`; daemon JSON additionally carries `line_end`.
16
+
### Changed
17
+
- Consolidated the 0.3.0 / 0.3.1 narrative into a single release entry (this one). The earlier entries described the same code twice with process detail that did not belong in public release notes.
18
+
- Trimmed `tests/blind/results/` to the two measurements a reader actually needs: baseline (`A_v0.2.2.json`) and shipping (`A_v0.3.json`), plus the upper-bound reference (`C_v0.2.2.json`). Intermediate iteration results removed.
19
+
- Tightened internal code comments and test docstrings so they describe current behavior rather than the iteration history that produced it.
20
+
- README metric claims explicitly labeled as "pilot" with sample size.
-`llms.txt` rewritten as an explicit agent contract (no prose filler; terse sections on workflow, output format, exit codes, failure modes).
22
+
### Removed
23
+
-`docs/v0.3-plan.md` — internal design scratch that should not have shipped in the public repo. The shipped design is documented in the "How search works" README section and in code comments.
25
24
26
-
## [0.3.0] - 2026-04-23
25
+
## [0.3.1] - 2026-04-23
27
26
28
27
### Added
29
-
-**Title-gated rerank blending.** When the title-channel's best match is in the rerank candidate pool, blend `alpha · normalized_rrf + (1 − alpha) · rerank_score` with `alpha = 0.60/0.50/0.40` by rank bucket. This protects confident exact-title / alias hits (e.g. searching `Zettelkasten`, `RRF`, `遗忘曲线`) from being demoted by a content-focused reranker. When no title hit is present, the reranker takes over fully — same as pre-v0.3 behavior — so poor first-stage ordering (e.g. `把文档切块放进向量库` where the correct answer is at RRF rank 11) is still recoverable. Measured on a 22-query blind test vs the same baseline: mean MRR 0.932 → 0.977 (+4.5 pp), mean Recall@10 unchanged, zero regressions. See `docs/v0.3-plan.md` for the iteration history (Options A / B / C) and `tests/blind/results/` for the raw JSON.
30
-
-**Line-range retrieval end-to-end.**
31
-
-`SearchResult` now carries `line_start` and `line_end` (1-indexed, inclusive), computed by mapping chunk `char_start` / `char_end` back through the frontmatter strip to on-disk line numbers.
32
-
- Daemon search responses include `line_start` / `line_end`.
33
-
- CLI `_print_search_results` displays `path:line_start title` so `path:LINE` can be piped straight into `seeklink get`.
34
-
- New `seeklink get PATH[:LINE] [-l N]` command reads the current on-disk file with universal-newline translation and prints the requested line range. Defaults: whole file (no `:LINE`), 100 lines starting at `LINE` (no `-l`), N lines (`-l`). Rejects path escapes, warns on beyond-EOF and `LINE < 1`.
35
-
- Helper `body_offset_to_file_line(full_text, body_char_offset) → int` handles the frontmatter offset; also correct when the frontmatter was deleted from disk after indexing.
36
-
-**Blind-test framework** at `tests/blind/`: 32-file CJK+EN corpus (`tests/corpus/`), 22 ground-truth queries (`tests/blind/queries.yaml`), runner (`tests/blind/run.py`) that cold-starts seeklink once per invocation, warms the reranker, measures `recall_at_10` / `mrr` / `latency_ms` / `p95`. Three configurations: A (baseline), B (v0.4 query expansion — not yet implemented), C (hand-crafted expansion, RRF-fused; upper bound). Used to validate this release; gates v0.4.
37
-
-**v0.3 plan + blind-test framework docs** at `docs/v0.3-plan.md` and `docs/blind-test.md`.
38
-
-**FRONTMATTER_RE** is now a public export from `seeklink.ingest` so the search layer can reuse the same regex for offset mapping.
28
+
-**Title-gated rerank blending.** When the title / alias channel produces a confident match in the rerank candidate pool, SeekLink blends a normalized first-stage score with the reranker output so exact title or alias hits (`Zettelkasten`, `RRF`, `遗忘曲线`, `[[alias]]`) are preserved at rank 1 instead of being demoted by a content-focused reranker. When no title signal is present, the reranker takes over fully — same behavior as v0.2.x — so poor first-stage ordering is still recoverable. On the bundled 22-query pilot (see `tests/blind/`): mean MRR 0.932 → 0.977, mean Recall@10 unchanged, no per-query regressions. Sample size is intentionally a pilot; larger labeled corpora are welcome.
29
+
-**Line-range retrieval.**`SearchResult` now carries 1-indexed inclusive `line_start` / `line_end` fields mapped through the indexer's frontmatter strip back to on-disk line numbers. CLI `search` prints `SCORE PATH:LINE TITLE` so agents can pipe the hit into a precise window read. A new `seeklink get PATH[:LINE] [-l N]` command performs that window read directly from the filesystem — no DB round-trip, no daemon involvement, universal-newline translation, path-escape rejection.
30
+
-**Cold-start `search` reranker parity.**`seeklink search --vault PATH` (the cold-start path) now constructs a reranker and passes it to the search pipeline, matching the daemon. Before this change, the same query returned different rankings depending on whether a daemon happened to be running.
31
+
-**Agent-first documentation.** New "For agents" section in the README (minimum workflow, output contract, exit codes, query-shape hints, daemon JSON fallback). `llms.txt` rewritten as an explicit contract.
32
+
-**Blind-test framework** at `tests/blind/`: 32-file bilingual (CJK + English) fixture corpus (`tests/corpus/`), 22 ground-truth queries (`tests/blind/queries.yaml`), runner that cold-starts once per invocation and measures `recall_at_10` / `mrr` / `latency_ms` / `p95`. Three configurations: `A` (current baseline), `B` (planned query expansion — not yet shipped), `C` (hand-crafted expansion, RRF-fused; upper bound). Used to gate this release.
39
33
40
34
### Fixed
41
-
-**Cold-start vs daemon parity.** Cold-start `seeklink search` (the path triggered when `--vault` is passed or the daemon is unreachable) now constructs a `Reranker()` and passes it to `search()`, matching the daemon's behavior. Previously the same query returned different rankings depending on whether a daemon happened to be running — a silent correctness bug. `Reranker()` construction is safe on platforms without MLX (Linux, Intel macOS) because the instance self-disables at model-load time.
42
-
-**Line-range accounting for newline-terminated files.**`seeklink get file:LINE` on a file that ends with `\n` no longer miscounts the trailing newline as an extra logical line. Line 6 of a 5-line (newline-terminated) file now correctly emits the `beyond-EOF` warning instead of returning a blank line.
43
-
-**Title-only match with deleted file.** When a search result references a source whose file has been removed from disk (title-only match via alias to a stale source), `compute_lines_for_results` no longer returns `line_start=1` — it degrades to `0/0` so agents aren't handed a `path:1` that won't resolve. Consistent with other missing-file paths.
35
+
-**`seeklink get` trailing-newline accounting.**`get FILE:LINE` on a newline-terminated file no longer counts the trailing `\n` as an extra logical line. `get FILE:6` on a 5-line file correctly emits the beyond-EOF warning instead of returning a blank line.
36
+
-**Title-only match with missing file.** If a search surfaces a title-only match whose file has been deleted from disk, `SearchResult.line_start` / `line_end` now remain at `0` rather than returning a misleading `path:1`.
37
+
38
+
### Changed
39
+
-**`SearchResult` gains `line_start` and `line_end` (default `0`).** Backward compatible for existing callers; populated when `search()` is called with `vault_root`.
40
+
-**`FRONTMATTER_RE`** is now a public export from `seeklink.ingest` (was `_FRONTMATTER_RE`), so the search layer can reuse it for offset mapping. The underscore-prefixed name still aliases it for backward compatibility.
44
41
45
42
### Dev
46
-
- PyYAML added as a dev dependency (required by `tests/blind/run.py`).
47
-
- Test suite: 185 → 203 tests (18 new). 3 for position-aware blending, 13 for `get` command + `body_offset_to_file_line` helper, 3 for end-to-end `SearchResult.line_start/line_end` population, 1 for trailing-newline EOF accounting. All green.
43
+
- PyYAML added as a dev dependency (required by the blind-test runner).
44
+
- Test suite: 185 → 204 tests (19 new).
45
+
46
+
### Deferred
47
+
-`SEEKLINK_DEBUG=1` blended-score logging.
48
+
- Per-result `mtime > indexed_at` drift warnings on the daemon path (cold-start already warns globally via `check_freshness`).
49
+
- Linux reranker via llama.cpp / GGUF.
48
50
49
-
### Deferred to v0.3.1+
50
-
-`SEEKLINK_DEBUG=1` blended-score logging (proposed in v0.3 plan, skipped to avoid scope creep).
51
-
- Per-result `mtime > indexed_at` drift warnings on the daemon path (cold-start already warns globally via `check_freshness`). Daemon-side follow-up tracked in `TODOS.md`.
52
-
- Linux reranker via llama.cpp / GGUF (`QuantFactory/Qwen3-Reranker-0.6B-GGUF` exists; wiring it into seeklink lives on after v0.3).
51
+
### Superseded
52
+
- This release supersedes the same-day `0.3.0` tag, which had the same code but shipped with inaccurate README content (quick-start ordering, latency numbers, `seeklink status` description). If you are pinning a version, use `0.3.1`.
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -217,7 +217,7 @@ When the reranker is enabled, a cross-encoder (`Qwen3-Reranker-0.6B` on MLX, ~1-
217
217
-**If the title channel's best match is in the candidate pool**, blend `alpha · normalized_rrf + (1 - alpha) · rerank_score` with `alpha = 0.60/0.50/0.40` by rank bucket. This protects exact title / alias hits from being demoted by a content-focused reranker.
218
218
-**Otherwise** (no strong title signal), the reranker score is used directly — same as pre-v0.3 behavior. This lets the reranker correct poor first-stage ordering.
219
219
220
-
On the built-in 22-query blind test, this improved mean MRR from 0.932 to 0.977 vs pure-reranker-override, with zero regressions. See `tests/blind/` for the methodology.
220
+
On the bundled 22-query pilot (see `tests/blind/`), mean MRR moved from 0.932 to 0.977 vs pure-reranker-override with no per-query regressions. Sample size is a pilot, not a statistically powered benchmark — contributions of larger labeled corpora are welcome.
@@ -265,7 +265,7 @@ Notes are chunked (~400 tokens), embedded with jina-embeddings-v2-base-zh, and i
265
265
266
266
## What changed in v0.3
267
267
268
-
-**Title-gated rerank blending**: when an exact title / alias hit drives rank 1, protect it from reranker demotion; otherwise fall back to pure reranker. Measured MRR gain of +4.5 pp over v0.2 on a 22-query blind test, with no regressions. See "How search works" above.
268
+
-**Title-gated rerank blending**: when an exact title / alias hit drives rank 1, protect it from reranker demotion; otherwise fall back to pure reranker. Measured mean MRR 0.932 → 0.977 on the bundled 22-query pilot (see "How search works" for caveats on sample size).
269
269
-**Line-range retrieval**: `search` results now include `line_start` / `line_end`, and a new `seeklink get PATH[:LINE] -l N` command prints line-precise windows. Agents can find-then-read without slurping whole files.
270
270
-**Cold-start / daemon parity fix**: cold-start `seeklink search` now constructs a `Reranker()` and passes it to the search pipeline. Previously the same query returned different rankings depending on whether the daemon was running.
271
271
-**Frontmatter-aware line mapping**: chunk offsets (stored against frontmatter-stripped body) are remapped to full-file line numbers, so `search` + `get` report lines the way you'd see them in a text editor.
Copy file name to clipboardExpand all lines: TODOS.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ The vault's `sources/` folder could store raw external content (textbooks, paper
29
29
Currently freshness warnings only appear in the cold-start CLI path (`seeklink search/status`). The daemon doesn't propagate warnings back to clients. Add a `warnings` field to daemon JSON responses so `cli_client` can print them.
30
30
31
31
### Daemon auto-respawn on config mismatch
32
-
`cli_client.call()` refuses to reuse a daemon bound to a different vault or started with a different embedder/reranker (P1 correctness fix), but falls back to cold-start on every subsequent CLI call after a switch until the user manually kills the stale daemon. Codex rated this P2: add a `shutdown` command to the daemon protocol, have the client shutdown + respawn on mismatch so the auto-spawn workflow keeps working across vault/model switches.
32
+
`cli_client.call()` refuses to reuse a daemon bound to a different vault or started with a different embedder/reranker (P1 correctness fix), but falls back to cold-start on every subsequent CLI call after a switch until the user manually kills the stale daemon. P2 follow-up: add a `shutdown` command to the daemon protocol, have the client shutdown + respawn on mismatch so the auto-spawn workflow keeps working across vault/model switches.
0 commit comments