perf(search): default reranker budget to auto

simonsysun · simonsysun · commit bae6b51ca046 · 2026-04-28T14:12:25.000-07:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -20,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Changed
 - Full-vault indexing now embeds chunks in length-sorted batches instead of one file at a time, improving first-run indexing throughput on real Markdown vaults while preserving single-file indexing behavior and the existing SQLite schema.
 - The MLX reranker now caps each passage to the first 200 tokens before scoring, reducing warm-query latency on long chunks while preserving the full result preview and `seeklink get` output.
+- `seeklink search` now defaults to `--rerank-k auto`, using a smaller reranker budget for ordinary lookups while preserving deeper reranking for filtered and technical CJK queries.
 
 ### Fixed
 - `seeklink search --rerank-k N` now limits the number of candidates passed to the cross-encoder even when `N` is lower than `--top-k`; the remaining results keep first-stage RRF order.
diff --git a/README.md b/README.md
@@ -146,8 +146,8 @@ seeklink search "query" --vault PATH [options]
 
 Options:
   --top-k N          Number of results (default: 10)
-  --rerank-k N|auto  Candidates to rerank with the cross-encoder (default: 20).
-                     Use auto for query-sensitive 5/20 candidate routing.
+  --rerank-k N|auto  Candidates to rerank with the cross-encoder (default: auto).
+                     Auto uses query-sensitive 5/20 candidate routing.
   --no-rerank        Skip cross-encoder reranking for this query
   --tags TAG [TAG]   Filter by tags (AND semantics)
   --folder PREFIX    Filter by folder (e.g. "notes/")
@@ -217,7 +217,7 @@ Many personal knowledge bases contain a mix of **titled articles** (permanent no
 
 ### Title-gated rerank blending (v0.3+)
 
-When the reranker is enabled, a cross-encoder (`Qwen3-Reranker-0.6B` on MLX, ~1-2s per query) re-scores the top-20 RRF candidates for precision. Use `--rerank-k N` to trade precision for latency on a single query, `--rerank-k auto` to let SeekLink pick a 5- or 20-candidate budget from the query shape, or `--no-rerank` to return raw RRF results without cross-encoder scoring. SeekLink applies **title-gated position blending** on top of reranked results:
+When the reranker is enabled, a cross-encoder (`Qwen3-Reranker-0.6B` on MLX, ~1-2s per query) re-scores a query-sensitive candidate budget for precision: 5 candidates for ordinary title / alias / natural-language lookups and 20 candidates for filtered or technical CJK queries. Use `--rerank-k N` to force a fixed budget for one query, or `--no-rerank` to return raw RRF results without cross-encoder scoring. SeekLink applies **title-gated position blending** on top of reranked results:
 
 - **If the title channel's best match is in the candidate pool**, blend `alpha · normalized_rrf + (1 - alpha) · rerank_score` with `alpha = 0.60/0.50/0.40` by rank bucket. This protects exact title / alias hits from being demoted by a content-focused reranker.
 - **Otherwise** (no strong title signal), the reranker score is used directly — same as pre-v0.3 behavior. This lets the reranker correct poor first-stage ordering.
diff --git a/docs/blind-test.md b/docs/blind-test.md
@@ -153,10 +153,10 @@ python tests/blind/run.py --config C ...
 python tests/blind/run.py --config A --no-rerank ...
 
 # Diagnostic: latency / quality sweep for reranker budget
+python tests/blind/run.py --config A --rerank-k auto --out .scratch/rerank-sweep/A_auto.json ...
 python tests/blind/run.py --config A --rerank-k 5  --out tests/blind/results/A_rerank5.json ...
 python tests/blind/run.py --config A --rerank-k 10 --out tests/blind/results/A_rerank10.json ...
 python tests/blind/run.py --config A --rerank-k 20 --out tests/blind/results/A_rerank20.json ...
-python tests/blind/run.py --config A --rerank-k auto --out .scratch/rerank-sweep/A_auto.json ...
 
 # Diagnostic: local metadata candidate-injection experiment
 python tests/blind/run.py --config A --rerank-k auto --metadata-expansion --out .scratch/rerank-sweep/A_metadata.json ...
@@ -168,8 +168,8 @@ Runner:
   query loop). Warms the embedder, FTS tokenizer, and when enabled the
   reranker with dummy calls so the first measured latency isn't
   model/cache/tokenizer startup.
-- Passes `--rerank-k` through to `search()`. Default `20` matches product
-  behavior; lower values and `auto` are diagnostic latency / quality probes.
+- Passes `--rerank-k` through to `search()`. Default `auto` matches product
+  behavior; fixed values are diagnostic latency / quality probes.
 - Records the per-query resolved reranker budget so `auto` sweeps can be
   audited without guessing which queries used 5 vs. 20 candidates.
 - Records first-stage channel diagnostics for config A so retrieval misses can
diff --git a/llms.txt b/llms.txt
@@ -75,7 +75,7 @@ No other codes.
 - Short queries that match a note title or alias get title-gated position protection so the exact hit anchors at rank 1.
 - Filters: `--tags T1 T2` (AND), `--folder PREFIX`. Multi-word filter values not supported.
 - `--title-weight F` override (default 1.5; raise to 3.0 for "find the definitive article", lower to 0.5 for "surface raw log moments").
-- Reranking controls: `--rerank-k N` changes how many first-stage candidates the cross-encoder scores (default 20); `--rerank-k auto` chooses a 5- or 20-candidate budget from the query shape; `--no-rerank` skips cross-encoder scoring for one query.
+- Reranking controls: `--rerank-k auto` is the default and chooses a 5- or 20-candidate budget from the query shape; `--rerank-k N` forces a fixed cross-encoder budget; `--no-rerank` skips cross-encoder scoring for one query.
 
 ### Common failure modes
 
diff --git a/seeklink/__main__.py b/seeklink/__main__.py
@@ -115,10 +115,10 @@ def main() -> None:
     search_p.add_argument(
         "--rerank-k",
         type=_parse_rerank_k,
-        default=20,
+        default="auto",
         help=(
             "Number of first-stage candidates to rerank with the cross-encoder "
-            "or 'auto' for query-sensitive routing (default: 20)"
+            "or 'auto' for query-sensitive routing (default: auto)"
         ),
     )
     search_p.add_argument(
diff --git a/seeklink/daemon.py b/seeklink/daemon.py
@@ -208,7 +208,7 @@ def _handle_connection(
                 folder=args.get("folder"),
                 title_weight=args.get("title_weight", 1.5),
                 reranker=None if args.get("no_rerank") else reranker,
-                rerank_k=args.get("rerank_k", 20),
+                rerank_k=args.get("rerank_k", "auto"),
                 vault_root=vault_root,
             )
             response = {
diff --git a/seeklink/search.py b/seeklink/search.py
@@ -99,11 +99,10 @@ def _resolve_rerank_k_with_reason(
 ) -> tuple[int, str]:
     """Resolve a numeric rerank budget for one query.
 
-    The default CLI path still passes an integer. The explicit "auto" mode is
-    a conservative policy from the 22-query pilot: English, title/alias, and
-    ordinary CJK lookups got most of the reranker benefit by reranking only the
-    top 5, while CJK / mixed technical queries needed deeper candidates to
-    recover recall.
+    The default CLI path uses "auto", a conservative policy from the 22-query
+    pilot: English, title/alias, and ordinary CJK lookups got most of the
+    reranker benefit by reranking only the top 5, while CJK / mixed technical
+    queries needed deeper candidates to recover recall.
     """
     if isinstance(rerank_k, int):
         return rerank_k, "fixed"
@@ -166,7 +165,7 @@ def search(
     tags: list[str] | None = None,
     folder: str | None = None,
     reranker: "Reranker | None" = None,
-    rerank_k: RerankK = 20,
+    rerank_k: RerankK = "auto",
     metadata_expansion: bool = False,
     metadata_weight: float = 1.0,
     metadata_max_sources: int = 8,
diff --git a/tests/blind/run.py b/tests/blind/run.py
@@ -569,10 +569,10 @@ def build_parser() -> argparse.ArgumentParser:
     parser.add_argument(
         "--rerank-k",
         type=_parse_rerank_k,
-        default=20,
+        default="auto",
         help=(
             "Number of first-stage candidates passed to the reranker "
-            "or 'auto' for query-sensitive routing (default: 20). "
+            "or 'auto' for query-sensitive routing (default: auto). "
             "Use with config A/C latency sweeps."
         ),
     )
diff --git a/tests/test_blind_runner_aggregates.py b/tests/test_blind_runner_aggregates.py
@@ -385,6 +385,24 @@ def test_parser_supports_auto_rerank_k(self):
 
         assert args.rerank_k == "auto"
 
+    def test_parser_defaults_to_auto_rerank_k(self):
+        parser = blind_run.build_parser()
+
+        args = parser.parse_args(
+            [
+                "--config",
+                "A",
+                "--queries",
+                "queries.yaml",
+                "--vault",
+                "vault",
+                "--out",
+                "out.json",
+            ]
+        )
+
+        assert args.rerank_k == "auto"
+
     def test_legacy_no_reranker_alias_still_works(self):
         parser = blind_run.build_parser()
 
diff --git a/tests/test_cli_json.py b/tests/test_cli_json.py
@@ -84,6 +84,19 @@ def fake_try_daemon(cmd: str, daemon_args: dict) -> dict:
     ]
 
 
+def test_search_parser_defaults_to_auto_rerank_k(monkeypatch):
+    captured: dict = {}
+
+    def fake_cmd_search(args):
+        captured["rerank_k"] = args.rerank_k
+
+    monkeypatch.setattr(sys, "argv", ["seeklink", "search", "memory"])
+    monkeypatch.setattr(cli, "_cmd_search", fake_cmd_search)
+    cli.main()
+
+    assert captured == {"rerank_k": "auto"}
+
+
 def test_search_result_to_json_truncates_preview():
     result = SearchResult(
         source_id=1,
diff --git a/tests/test_daemon_protocol.py b/tests/test_daemon_protocol.py
@@ -154,3 +154,50 @@ def fake_search(db, embedder, query, **kwargs):
         "reranker": fake_reranker,
         "rerank_k": "auto",
     }
+
+
+def test_search_defaults_to_auto_rerank_k(monkeypatch):
+    client, server = socket.socketpair()
+    captured: dict = {}
+
+    class FakeEmbedder:
+        MODEL_NAME = "test-embedder"
+
+    class FakeReranker:
+        disabled = False
+        MODEL_NAME = "test-reranker"
+
+    def fake_search(db, embedder, query, **kwargs):
+        captured["query"] = query
+        captured["rerank_k"] = kwargs["rerank_k"]
+        return []
+
+    search_module = importlib.import_module("seeklink.search")
+    monkeypatch.setattr(search_module, "search", fake_search)
+
+    try:
+        _send_request(
+            client,
+            {
+                "cmd": "search",
+                "args": {"query": "memory"},
+            },
+        )
+        _handle_connection(
+            server,
+            db=object(),
+            embedder=FakeEmbedder(),
+            reranker=FakeReranker(),
+            vault_root=Path("/tmp/vault"),
+        )
+        response = _recv_response(client)
+    finally:
+        client.close()
+        server.close()
+
+    assert response["ok"] is True
+    assert response["result"] == []
+    assert captured == {
+        "query": "memory",
+        "rerank_k": "auto",
+    }

Original file line number	Diff line number	Diff line change
`@@ -208,7 +208,7 @@ def _handle_connection(`
`208`	`208`	`folder=args.get("folder"),`
`209`	`209`	`title_weight=args.get("title_weight", 1.5),`
`210`	`210`	`reranker=None if args.get("no_rerank") else reranker,`
`211`		`- rerank_k=args.get("rerank_k", 20),`
	`211`	`+ rerank_k=args.get("rerank_k", "auto"),`
`212`	`212`	`vault_root=vault_root,`
`213`	`213`	`)`
`214`	`214`	`response = {`