Skip to content

Commit 88d3576

Browse files
committed
0.3.2: repository cleanup — strip internal R&D artifacts from public surface
This release contains no runtime code changes from 0.3.1. It removes development-process artifacts that leaked into the public repo during the 0.3 line's rapid iteration, so readers see a shipped tool instead of a work log. - Consolidated the 0.3.0 and 0.3.1 CHANGELOG entries (same code described twice with process detail) into a single 0.3.2 entry. - Trimmed tests/blind/results/ to the measurements a reader needs: baseline (A_v0.2.2.json), shipping (A_v0.3.json), upper bound (C_v0.2.2.json). Intermediate iteration results removed. - Removed docs/v0.3-plan.md. It was internal design scratch; the shipped design is documented in the README "How search works" section and in code comments. - Tightened code and test docstrings to describe current behavior, not the exploration that led there. - README metric claims labeled as "pilot" with explicit sample size. - Removed residual cross-references to other projects and internal tooling from public-facing docs.
1 parent d3af7f2 commit 88d3576

17 files changed

Lines changed: 666 additions & 6125 deletions

CHANGELOG.md

Lines changed: 32 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -7,49 +7,49 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10-
## [0.3.1] - 2026-04-23
10+
## [0.3.2] - 2026-04-23
1111

12-
Docs-only patch. No code changes from 0.3.0; this release exists purely
13-
to refresh the README snapshot shown on the PyPI project page.
12+
Repository cleanup pass. No code changes affecting runtime behavior
13+
from 0.3.1; this release removes internal R&D artifacts from the public
14+
surface so the repo reads as a shipped tool rather than a work log.
1415

15-
### Fixed (documentation)
16-
- Quick-start ordering corrected: `seeklink index` must run before `seeklink search` (searching an un-indexed vault returns no results).
17-
- `seeklink status` is always cold-start (direct SQLite + freshness read); removed incorrect mentions of it auto-spawning the daemon.
18-
- `seeklink get` is a direct filesystem read; clarified it doesn't involve the daemon either.
19-
- Latency figures separated by configuration: warm reranker-on path is ~1-2s per query (not ~0.5s); ~10ms applies only to the reranker-disabled path.
20-
- CLI vs daemon output surfaces split: CLI prints `path:line_start`; daemon JSON additionally carries `line_end`.
16+
### Changed
17+
- Consolidated the 0.3.0 / 0.3.1 narrative into a single release entry (this one). The earlier entries described the same code twice with process detail that did not belong in public release notes.
18+
- Trimmed `tests/blind/results/` to the two measurements a reader actually needs: baseline (`A_v0.2.2.json`) and shipping (`A_v0.3.json`), plus the upper-bound reference (`C_v0.2.2.json`). Intermediate iteration results removed.
19+
- Tightened internal code comments and test docstrings so they describe current behavior rather than the iteration history that produced it.
20+
- README metric claims explicitly labeled as "pilot" with sample size.
2121

22-
### Added (documentation)
23-
- README "For agents" section: minimum workflow, output contract, exit codes, query-shape hints, daemon JSON fallback.
24-
- `llms.txt` rewritten as an explicit agent contract (no prose filler; terse sections on workflow, output format, exit codes, failure modes).
22+
### Removed
23+
- `docs/v0.3-plan.md` — internal design scratch that should not have shipped in the public repo. The shipped design is documented in the "How search works" README section and in code comments.
2524

26-
## [0.3.0] - 2026-04-23
25+
## [0.3.1] - 2026-04-23
2726

2827
### Added
29-
- **Title-gated rerank blending.** When the title-channel's best match is in the rerank candidate pool, blend `alpha · normalized_rrf + (1 − alpha) · rerank_score` with `alpha = 0.60/0.50/0.40` by rank bucket. This protects confident exact-title / alias hits (e.g. searching `Zettelkasten`, `RRF`, `遗忘曲线`) from being demoted by a content-focused reranker. When no title hit is present, the reranker takes over fully — same as pre-v0.3 behavior — so poor first-stage ordering (e.g. `把文档切块放进向量库` where the correct answer is at RRF rank 11) is still recoverable. Measured on a 22-query blind test vs the same baseline: mean MRR 0.932 → 0.977 (+4.5 pp), mean Recall@10 unchanged, zero regressions. See `docs/v0.3-plan.md` for the iteration history (Options A / B / C) and `tests/blind/results/` for the raw JSON.
30-
- **Line-range retrieval end-to-end.**
31-
- `SearchResult` now carries `line_start` and `line_end` (1-indexed, inclusive), computed by mapping chunk `char_start` / `char_end` back through the frontmatter strip to on-disk line numbers.
32-
- Daemon search responses include `line_start` / `line_end`.
33-
- CLI `_print_search_results` displays `path:line_start title` so `path:LINE` can be piped straight into `seeklink get`.
34-
- New `seeklink get PATH[:LINE] [-l N]` command reads the current on-disk file with universal-newline translation and prints the requested line range. Defaults: whole file (no `:LINE`), 100 lines starting at `LINE` (no `-l`), N lines (`-l`). Rejects path escapes, warns on beyond-EOF and `LINE < 1`.
35-
- Helper `body_offset_to_file_line(full_text, body_char_offset) → int` handles the frontmatter offset; also correct when the frontmatter was deleted from disk after indexing.
36-
- **Blind-test framework** at `tests/blind/`: 32-file CJK+EN corpus (`tests/corpus/`), 22 ground-truth queries (`tests/blind/queries.yaml`), runner (`tests/blind/run.py`) that cold-starts seeklink once per invocation, warms the reranker, measures `recall_at_10` / `mrr` / `latency_ms` / `p95`. Three configurations: A (baseline), B (v0.4 query expansion — not yet implemented), C (hand-crafted expansion, RRF-fused; upper bound). Used to validate this release; gates v0.4.
37-
- **v0.3 plan + blind-test framework docs** at `docs/v0.3-plan.md` and `docs/blind-test.md`.
38-
- **FRONTMATTER_RE** is now a public export from `seeklink.ingest` so the search layer can reuse the same regex for offset mapping.
28+
- **Title-gated rerank blending.** When the title / alias channel produces a confident match in the rerank candidate pool, SeekLink blends a normalized first-stage score with the reranker output so exact title or alias hits (`Zettelkasten`, `RRF`, `遗忘曲线`, `[[alias]]`) are preserved at rank 1 instead of being demoted by a content-focused reranker. When no title signal is present, the reranker takes over fully — same behavior as v0.2.x — so poor first-stage ordering is still recoverable. On the bundled 22-query pilot (see `tests/blind/`): mean MRR 0.932 → 0.977, mean Recall@10 unchanged, no per-query regressions. Sample size is intentionally a pilot; larger labeled corpora are welcome.
29+
- **Line-range retrieval.** `SearchResult` now carries 1-indexed inclusive `line_start` / `line_end` fields mapped through the indexer's frontmatter strip back to on-disk line numbers. CLI `search` prints `SCORE PATH:LINE TITLE` so agents can pipe the hit into a precise window read. A new `seeklink get PATH[:LINE] [-l N]` command performs that window read directly from the filesystem — no DB round-trip, no daemon involvement, universal-newline translation, path-escape rejection.
30+
- **Cold-start `search` reranker parity.** `seeklink search --vault PATH` (the cold-start path) now constructs a reranker and passes it to the search pipeline, matching the daemon. Before this change, the same query returned different rankings depending on whether a daemon happened to be running.
31+
- **Agent-first documentation.** New "For agents" section in the README (minimum workflow, output contract, exit codes, query-shape hints, daemon JSON fallback). `llms.txt` rewritten as an explicit contract.
32+
- **Blind-test framework** at `tests/blind/`: 32-file bilingual (CJK + English) fixture corpus (`tests/corpus/`), 22 ground-truth queries (`tests/blind/queries.yaml`), runner that cold-starts once per invocation and measures `recall_at_10` / `mrr` / `latency_ms` / `p95`. Three configurations: `A` (current baseline), `B` (planned query expansion — not yet shipped), `C` (hand-crafted expansion, RRF-fused; upper bound). Used to gate this release.
3933

4034
### Fixed
41-
- **Cold-start vs daemon parity.** Cold-start `seeklink search` (the path triggered when `--vault` is passed or the daemon is unreachable) now constructs a `Reranker()` and passes it to `search()`, matching the daemon's behavior. Previously the same query returned different rankings depending on whether a daemon happened to be running — a silent correctness bug. `Reranker()` construction is safe on platforms without MLX (Linux, Intel macOS) because the instance self-disables at model-load time.
42-
- **Line-range accounting for newline-terminated files.** `seeklink get file:LINE` on a file that ends with `\n` no longer miscounts the trailing newline as an extra logical line. Line 6 of a 5-line (newline-terminated) file now correctly emits the `beyond-EOF` warning instead of returning a blank line.
43-
- **Title-only match with deleted file.** When a search result references a source whose file has been removed from disk (title-only match via alias to a stale source), `compute_lines_for_results` no longer returns `line_start=1` — it degrades to `0/0` so agents aren't handed a `path:1` that won't resolve. Consistent with other missing-file paths.
35+
- **`seeklink get` trailing-newline accounting.** `get FILE:LINE` on a newline-terminated file no longer counts the trailing `\n` as an extra logical line. `get FILE:6` on a 5-line file correctly emits the beyond-EOF warning instead of returning a blank line.
36+
- **Title-only match with missing file.** If a search surfaces a title-only match whose file has been deleted from disk, `SearchResult.line_start` / `line_end` now remain at `0` rather than returning a misleading `path:1`.
37+
38+
### Changed
39+
- **`SearchResult` gains `line_start` and `line_end` (default `0`).** Backward compatible for existing callers; populated when `search()` is called with `vault_root`.
40+
- **`FRONTMATTER_RE`** is now a public export from `seeklink.ingest` (was `_FRONTMATTER_RE`), so the search layer can reuse it for offset mapping. The underscore-prefixed name still aliases it for backward compatibility.
4441

4542
### Dev
46-
- PyYAML added as a dev dependency (required by `tests/blind/run.py`).
47-
- Test suite: 185 → 203 tests (18 new). 3 for position-aware blending, 13 for `get` command + `body_offset_to_file_line` helper, 3 for end-to-end `SearchResult.line_start/line_end` population, 1 for trailing-newline EOF accounting. All green.
43+
- PyYAML added as a dev dependency (required by the blind-test runner).
44+
- Test suite: 185 → 204 tests (19 new).
45+
46+
### Deferred
47+
- `SEEKLINK_DEBUG=1` blended-score logging.
48+
- Per-result `mtime > indexed_at` drift warnings on the daemon path (cold-start already warns globally via `check_freshness`).
49+
- Linux reranker via llama.cpp / GGUF.
4850

49-
### Deferred to v0.3.1+
50-
- `SEEKLINK_DEBUG=1` blended-score logging (proposed in v0.3 plan, skipped to avoid scope creep).
51-
- Per-result `mtime > indexed_at` drift warnings on the daemon path (cold-start already warns globally via `check_freshness`). Daemon-side follow-up tracked in `TODOS.md`.
52-
- Linux reranker via llama.cpp / GGUF (`QuantFactory/Qwen3-Reranker-0.6B-GGUF` exists; wiring it into seeklink lives on after v0.3).
51+
### Superseded
52+
- This release supersedes the same-day `0.3.0` tag, which had the same code but shipped with inaccurate README content (quick-start ordering, latency numbers, `seeklink status` description). If you are pinning a version, use `0.3.1`.
5353

5454
## [0.2.2] - 2026-04-19
5555

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ When the reranker is enabled, a cross-encoder (`Qwen3-Reranker-0.6B` on MLX, ~1-
217217
- **If the title channel's best match is in the candidate pool**, blend `alpha · normalized_rrf + (1 - alpha) · rerank_score` with `alpha = 0.60/0.50/0.40` by rank bucket. This protects exact title / alias hits from being demoted by a content-focused reranker.
218218
- **Otherwise** (no strong title signal), the reranker score is used directly — same as pre-v0.3 behavior. This lets the reranker correct poor first-stage ordering.
219219

220-
On the built-in 22-query blind test, this improved mean MRR from 0.932 to 0.977 vs pure-reranker-override, with zero regressions. See `tests/blind/` for the methodology.
220+
On the bundled 22-query pilot (see `tests/blind/`), mean MRR moved from 0.932 to 0.977 vs pure-reranker-override with no per-query regressions. Sample size is a pilot, not a statistically powered benchmark — contributions of larger labeled corpora are welcome.
221221

222222
Disable reranking entirely with: `export SEEKLINK_RERANKER_MODEL=""`
223223

@@ -265,7 +265,7 @@ Notes are chunked (~400 tokens), embedded with jina-embeddings-v2-base-zh, and i
265265

266266
## What changed in v0.3
267267

268-
- **Title-gated rerank blending**: when an exact title / alias hit drives rank 1, protect it from reranker demotion; otherwise fall back to pure reranker. Measured MRR gain of +4.5 pp over v0.2 on a 22-query blind test, with no regressions. See "How search works" above.
268+
- **Title-gated rerank blending**: when an exact title / alias hit drives rank 1, protect it from reranker demotion; otherwise fall back to pure reranker. Measured mean MRR 0.932 → 0.977 on the bundled 22-query pilot (see "How search works" for caveats on sample size).
269269
- **Line-range retrieval**: `search` results now include `line_start` / `line_end`, and a new `seeklink get PATH[:LINE] -l N` command prints line-precise windows. Agents can find-then-read without slurping whole files.
270270
- **Cold-start / daemon parity fix**: cold-start `seeklink search` now constructs a `Reranker()` and passes it to the search pipeline. Previously the same query returned different rankings depending on whether the daemon was running.
271271
- **Frontmatter-aware line mapping**: chunk offsets (stored against frontmatter-stripped body) are remapped to full-file line numbers, so `search` + `get` report lines the way you'd see them in a text editor.

TODOS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The vault's `sources/` folder could store raw external content (textbooks, paper
2929
Currently freshness warnings only appear in the cold-start CLI path (`seeklink search/status`). The daemon doesn't propagate warnings back to clients. Add a `warnings` field to daemon JSON responses so `cli_client` can print them.
3030

3131
### Daemon auto-respawn on config mismatch
32-
`cli_client.call()` refuses to reuse a daemon bound to a different vault or started with a different embedder/reranker (P1 correctness fix), but falls back to cold-start on every subsequent CLI call after a switch until the user manually kills the stale daemon. Codex rated this P2: add a `shutdown` command to the daemon protocol, have the client shutdown + respawn on mismatch so the auto-spawn workflow keeps working across vault/model switches.
32+
`cli_client.call()` refuses to reuse a daemon bound to a different vault or started with a different embedder/reranker (P1 correctness fix), but falls back to cold-start on every subsequent CLI call after a switch until the user manually kills the stale daemon. P2 follow-up: add a `shutdown` command to the daemon protocol, have the client shutdown + respawn on mismatch so the auto-spawn workflow keeps working across vault/model switches.
3333

3434
## Infrastructure
3535

docs/blind-test.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Then `uv sync --dev`.
5858
expected_paths:
5959
- "notes/fsrs-algorithm.md"
6060
- "notes/spaced-repetition.md"
61-
- "logs/rhizome-dev/2026-W15.md"
61+
- "logs/2026-W15.md"
6262
tags: [cjk, common]
6363
expansion:
6464
- "间隔重复 遗忘曲线 FSRS"
@@ -80,7 +80,7 @@ Then `uv sync --dev`.
8080
**20-30 queries total.** Fewer than 15 and single-query noise dominates the
8181
averages.
8282
83-
1. Real-user queries only. Pull from shell history, rhizome logs, or
83+
1. Real-user queries only. Pull from shell history, your own notes, or
8484
memory. No synthetic queries.
8585
2. For each, list 2-5 `expected_paths` you'd be annoyed if not in top 10.
8686
Hard must-hit semantics — not "would be nice".
@@ -94,8 +94,8 @@ averages.
9494
`short`, `ambiguous`, `technical`, `common`.
9595

9696
**Ground-truth stability**: commit `queries.yaml` alongside a vault-state
97-
marker (e.g. the current `rhizome log` head SHA). If you re-run against an
98-
edited vault, note the drift.
97+
marker (e.g. the vault's git commit SHA if it's versioned). If you re-run
98+
against an edited vault, note the drift.
9999

100100
## Metrics
101101

@@ -127,7 +127,7 @@ config, writes results JSON. Invocation:
127127
python tests/blind/run.py \
128128
--config A \
129129
--queries tests/blind/queries.yaml \
130-
--vault ~/Rhizome \
130+
--vault /path/to/vault \
131131
--out tests/blind/results/A.json
132132
133133
# Ship candidate — requires v0.4 expansion hook (runner raises until then)
@@ -194,10 +194,11 @@ Abandon v0.4 and look at the embedder (v0.5+) or retrieval channels.
194194

195195
- **Ground truth scope.** Hard must-hit only, or "should appear" (weaker)?
196196
→ Propose: hard must-hit only. Weaker signal = more subjective.
197-
- **Expansion prompt template.** qmd uses `/no_think Expand this search
198-
query: {query}` with GBNF output grammar, backed by a **fine-tuned**
199-
Qwen3-1.7B. Base Qwen3-0.6B has no such training; needs a richer
200-
few-shot prompt. Draft the prompt once; commit alongside queries.yaml.
197+
- **Expansion prompt template.** Base Qwen3-0.6B is not fine-tuned for
198+
query rewriting, so config B will need a few-shot prompt (variants
199+
produced, one per line, bounded length). Draft once and commit
200+
alongside `queries.yaml`. Optionally constrain output format via a
201+
GBNF grammar if using llama.cpp.
201202
- **Inference backend for B.** MLX (macOS) or llama.cpp (cross-platform)?
202203
→ Run both, pick the one that hits the p95 budget. Record which.
203204
- **Randomness.** Qwen3 at temperature 0.7 is non-deterministic. Propose:
@@ -207,8 +208,7 @@ Abandon v0.4 and look at the embedder (v0.5+) or retrieval channels.
207208

208209
## Out of scope for this framework
209210

210-
- Automated labeling (no — Simon labels ground truth by hand)
211-
- CI-integrated regression (no — this is a pre-release gate, not a
212-
continuous monitor)
213-
- Comparison against external tools (qmd, ripgrep, etc.) — different
214-
vaults, apples to oranges
211+
- Automated labeling (ground truth is labeled by hand).
212+
- CI-integrated regression (this is a pre-release gate, not a
213+
continuous monitor).
214+
- Cross-tool comparison (different corpora are incommensurable).

0 commit comments

Comments
 (0)