Skip to content

Commit bae6b51

Browse files
committed
perf(search): default reranker budget to auto
1 parent ce6744b commit bae6b51

11 files changed

Lines changed: 96 additions & 18 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2020
### Changed
2121
- Full-vault indexing now embeds chunks in length-sorted batches instead of one file at a time, improving first-run indexing throughput on real Markdown vaults while preserving single-file indexing behavior and the existing SQLite schema.
2222
- The MLX reranker now caps each passage to the first 200 tokens before scoring, reducing warm-query latency on long chunks while preserving the full result preview and `seeklink get` output.
23+
- `seeklink search` now defaults to `--rerank-k auto`, using a smaller reranker budget for ordinary lookups while preserving deeper reranking for filtered and technical CJK queries.
2324

2425
### Fixed
2526
- `seeklink search --rerank-k N` now limits the number of candidates passed to the cross-encoder even when `N` is lower than `--top-k`; the remaining results keep first-stage RRF order.

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -146,8 +146,8 @@ seeklink search "query" --vault PATH [options]
146146
147147
Options:
148148
--top-k N Number of results (default: 10)
149-
--rerank-k N|auto Candidates to rerank with the cross-encoder (default: 20).
150-
Use auto for query-sensitive 5/20 candidate routing.
149+
--rerank-k N|auto Candidates to rerank with the cross-encoder (default: auto).
150+
Auto uses query-sensitive 5/20 candidate routing.
151151
--no-rerank Skip cross-encoder reranking for this query
152152
--tags TAG [TAG] Filter by tags (AND semantics)
153153
--folder PREFIX Filter by folder (e.g. "notes/")
@@ -217,7 +217,7 @@ Many personal knowledge bases contain a mix of **titled articles** (permanent no
217217

218218
### Title-gated rerank blending (v0.3+)
219219

220-
When the reranker is enabled, a cross-encoder (`Qwen3-Reranker-0.6B` on MLX, ~1-2s per query) re-scores the top-20 RRF candidates for precision. Use `--rerank-k N` to trade precision for latency on a single query, `--rerank-k auto` to let SeekLink pick a 5- or 20-candidate budget from the query shape, or `--no-rerank` to return raw RRF results without cross-encoder scoring. SeekLink applies **title-gated position blending** on top of reranked results:
220+
When the reranker is enabled, a cross-encoder (`Qwen3-Reranker-0.6B` on MLX, ~1-2s per query) re-scores a query-sensitive candidate budget for precision: 5 candidates for ordinary title / alias / natural-language lookups and 20 candidates for filtered or technical CJK queries. Use `--rerank-k N` to force a fixed budget for one query, or `--no-rerank` to return raw RRF results without cross-encoder scoring. SeekLink applies **title-gated position blending** on top of reranked results:
221221

222222
- **If the title channel's best match is in the candidate pool**, blend `alpha · normalized_rrf + (1 - alpha) · rerank_score` with `alpha = 0.60/0.50/0.40` by rank bucket. This protects exact title / alias hits from being demoted by a content-focused reranker.
223223
- **Otherwise** (no strong title signal), the reranker score is used directly — same as pre-v0.3 behavior. This lets the reranker correct poor first-stage ordering.

docs/blind-test.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -153,10 +153,10 @@ python tests/blind/run.py --config C ...
153153
python tests/blind/run.py --config A --no-rerank ...
154154

155155
# Diagnostic: latency / quality sweep for reranker budget
156+
python tests/blind/run.py --config A --rerank-k auto --out .scratch/rerank-sweep/A_auto.json ...
156157
python tests/blind/run.py --config A --rerank-k 5 --out tests/blind/results/A_rerank5.json ...
157158
python tests/blind/run.py --config A --rerank-k 10 --out tests/blind/results/A_rerank10.json ...
158159
python tests/blind/run.py --config A --rerank-k 20 --out tests/blind/results/A_rerank20.json ...
159-
python tests/blind/run.py --config A --rerank-k auto --out .scratch/rerank-sweep/A_auto.json ...
160160

161161
# Diagnostic: local metadata candidate-injection experiment
162162
python tests/blind/run.py --config A --rerank-k auto --metadata-expansion --out .scratch/rerank-sweep/A_metadata.json ...
@@ -168,8 +168,8 @@ Runner:
168168
query loop). Warms the embedder, FTS tokenizer, and when enabled the
169169
reranker with dummy calls so the first measured latency isn't
170170
model/cache/tokenizer startup.
171-
- Passes `--rerank-k` through to `search()`. Default `20` matches product
172-
behavior; lower values and `auto` are diagnostic latency / quality probes.
171+
- Passes `--rerank-k` through to `search()`. Default `auto` matches product
172+
behavior; fixed values are diagnostic latency / quality probes.
173173
- Records the per-query resolved reranker budget so `auto` sweeps can be
174174
audited without guessing which queries used 5 vs. 20 candidates.
175175
- Records first-stage channel diagnostics for config A so retrieval misses can

llms.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ No other codes.
7575
- Short queries that match a note title or alias get title-gated position protection so the exact hit anchors at rank 1.
7676
- Filters: `--tags T1 T2` (AND), `--folder PREFIX`. Multi-word filter values not supported.
7777
- `--title-weight F` override (default 1.5; raise to 3.0 for "find the definitive article", lower to 0.5 for "surface raw log moments").
78-
- Reranking controls: `--rerank-k N` changes how many first-stage candidates the cross-encoder scores (default 20); `--rerank-k auto` chooses a 5- or 20-candidate budget from the query shape; `--no-rerank` skips cross-encoder scoring for one query.
78+
- Reranking controls: `--rerank-k auto` is the default and chooses a 5- or 20-candidate budget from the query shape; `--rerank-k N` forces a fixed cross-encoder budget; `--no-rerank` skips cross-encoder scoring for one query.
7979

8080
### Common failure modes
8181

seeklink/__main__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,10 +115,10 @@ def main() -> None:
115115
search_p.add_argument(
116116
"--rerank-k",
117117
type=_parse_rerank_k,
118-
default=20,
118+
default="auto",
119119
help=(
120120
"Number of first-stage candidates to rerank with the cross-encoder "
121-
"or 'auto' for query-sensitive routing (default: 20)"
121+
"or 'auto' for query-sensitive routing (default: auto)"
122122
),
123123
)
124124
search_p.add_argument(

seeklink/daemon.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ def _handle_connection(
208208
folder=args.get("folder"),
209209
title_weight=args.get("title_weight", 1.5),
210210
reranker=None if args.get("no_rerank") else reranker,
211-
rerank_k=args.get("rerank_k", 20),
211+
rerank_k=args.get("rerank_k", "auto"),
212212
vault_root=vault_root,
213213
)
214214
response = {

seeklink/search.py

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -99,11 +99,10 @@ def _resolve_rerank_k_with_reason(
9999
) -> tuple[int, str]:
100100
"""Resolve a numeric rerank budget for one query.
101101
102-
The default CLI path still passes an integer. The explicit "auto" mode is
103-
a conservative policy from the 22-query pilot: English, title/alias, and
104-
ordinary CJK lookups got most of the reranker benefit by reranking only the
105-
top 5, while CJK / mixed technical queries needed deeper candidates to
106-
recover recall.
102+
The default CLI path uses "auto", a conservative policy from the 22-query
103+
pilot: English, title/alias, and ordinary CJK lookups got most of the
104+
reranker benefit by reranking only the top 5, while CJK / mixed technical
105+
queries needed deeper candidates to recover recall.
107106
"""
108107
if isinstance(rerank_k, int):
109108
return rerank_k, "fixed"
@@ -166,7 +165,7 @@ def search(
166165
tags: list[str] | None = None,
167166
folder: str | None = None,
168167
reranker: "Reranker | None" = None,
169-
rerank_k: RerankK = 20,
168+
rerank_k: RerankK = "auto",
170169
metadata_expansion: bool = False,
171170
metadata_weight: float = 1.0,
172171
metadata_max_sources: int = 8,

tests/blind/run.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -569,10 +569,10 @@ def build_parser() -> argparse.ArgumentParser:
569569
parser.add_argument(
570570
"--rerank-k",
571571
type=_parse_rerank_k,
572-
default=20,
572+
default="auto",
573573
help=(
574574
"Number of first-stage candidates passed to the reranker "
575-
"or 'auto' for query-sensitive routing (default: 20). "
575+
"or 'auto' for query-sensitive routing (default: auto). "
576576
"Use with config A/C latency sweeps."
577577
),
578578
)

tests/test_blind_runner_aggregates.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -385,6 +385,24 @@ def test_parser_supports_auto_rerank_k(self):
385385

386386
assert args.rerank_k == "auto"
387387

388+
def test_parser_defaults_to_auto_rerank_k(self):
389+
parser = blind_run.build_parser()
390+
391+
args = parser.parse_args(
392+
[
393+
"--config",
394+
"A",
395+
"--queries",
396+
"queries.yaml",
397+
"--vault",
398+
"vault",
399+
"--out",
400+
"out.json",
401+
]
402+
)
403+
404+
assert args.rerank_k == "auto"
405+
388406
def test_legacy_no_reranker_alias_still_works(self):
389407
parser = blind_run.build_parser()
390408

tests/test_cli_json.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,19 @@ def fake_try_daemon(cmd: str, daemon_args: dict) -> dict:
8484
]
8585

8686

87+
def test_search_parser_defaults_to_auto_rerank_k(monkeypatch):
88+
captured: dict = {}
89+
90+
def fake_cmd_search(args):
91+
captured["rerank_k"] = args.rerank_k
92+
93+
monkeypatch.setattr(sys, "argv", ["seeklink", "search", "memory"])
94+
monkeypatch.setattr(cli, "_cmd_search", fake_cmd_search)
95+
cli.main()
96+
97+
assert captured == {"rerank_k": "auto"}
98+
99+
87100
def test_search_result_to_json_truncates_preview():
88101
result = SearchResult(
89102
source_id=1,

0 commit comments

Comments
 (0)