Skip to content

Commit a711e83

Browse files
committed
fix(search): narrow adaptive deep reranking
1 parent e05150f commit a711e83

4 files changed

Lines changed: 44 additions & 10 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111
- `seeklink get PATH:LINE -C N` prints a grep-style context window around a search hit, returning `N` lines before and after the requested line while preserving direct filesystem reads and path-escape protection.
1212
- `seeklink search --json` and `seeklink status --json` emit stable machine-readable stdout for agents that should not scrape the human text format.
1313
- `seeklink search --rerank-k N` and `seeklink search --no-rerank` let callers trade precision for latency per query without changing the global reranker configuration.
14-
- `seeklink search --rerank-k auto` chooses a 5- or 20-candidate reranker budget from query shape, keeping exact title / alias and English queries fast while giving CJK and filtered queries deeper reranking.
14+
- `seeklink search --rerank-k auto` chooses a 5- or 20-candidate reranker budget from query shape, keeping exact title / alias, English, and ordinary CJK queries fast while giving filtered and CJK technical queries deeper reranking.
1515
- The blind-test runner now accepts `--rerank-k N`, `--rerank-k auto`, and `--no-rerank`, and records requested plus resolved reranking metadata in result JSON for latency / quality sweeps.
1616

1717
### Fixed

docs/blind-test.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,8 @@ For each `(query, config)` pair (recorded by the runner):
100100
when reranking is disabled, or `"auto"` for query-sensitive routing)
101101
- `resolved_rerank_k` — actual numeric candidate budget used for this query
102102
(`5` or `20` for `auto`, `0` when reranking is disabled)
103-
- `rerank_k_reason` — why `auto` chose that budget (`title`, `cjk`,
104-
`filter`, `default`, or `fixed`)
103+
- `rerank_k_reason` — why `auto` chose that budget (`title`,
104+
`cjk_technical`, `filter`, `default`, or `fixed`)
105105
- `recall_at_10` — fraction of `expected_paths` in top-10
106106
- `mrr` — reciprocal rank of first expected hit in top-10 (0 if none)
107107

seeklink/search.py

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,27 @@
2222
RerankK = int | Literal["auto"]
2323
AUTO_RERANK_FAST_K = 5
2424
AUTO_RERANK_DEEP_K = 20
25+
_CJK_TECHNICAL_RERANK_TERMS = (
26+
"向量",
27+
"嵌入",
28+
"文档",
29+
"切块",
30+
"算法",
31+
"搜索",
32+
"检索",
33+
"融合",
34+
"模型",
35+
"数据库",
36+
"fsrs",
37+
"bm25",
38+
"rrf",
39+
"rag",
40+
"anki",
41+
"hybrid",
42+
"chunk",
43+
"embedding",
44+
"vector",
45+
)
2546

2647

2748
@dataclass(slots=True)
@@ -36,6 +57,11 @@ def _contains_cjk(text: str) -> bool:
3657
return any("\u3400" <= ch <= "\u9fff" for ch in text)
3758

3859

60+
def _contains_technical_rerank_term(text: str) -> bool:
61+
folded = text.casefold()
62+
return any(term in folded for term in _CJK_TECHNICAL_RERANK_TERMS)
63+
64+
3965
def _resolve_rerank_k_with_reason(
4066
query: str,
4167
rerank_k: RerankK,
@@ -46,10 +72,10 @@ def _resolve_rerank_k_with_reason(
4672
"""Resolve a numeric rerank budget for one query.
4773
4874
The default CLI path still passes an integer. The explicit "auto" mode is
49-
a conservative policy from the 22-query pilot: English and title/alias
50-
lookups got most of the reranker benefit by reranking only the top 5, while
51-
CJK / mixed queries without a title hit needed deeper candidates to recover
52-
recall.
75+
a conservative policy from the 22-query pilot: English, title/alias, and
76+
ordinary CJK lookups got most of the reranker benefit by reranking only the
77+
top 5, while CJK / mixed technical queries needed deeper candidates to
78+
recover recall.
5379
"""
5480
if isinstance(rerank_k, int):
5581
return rerank_k, "fixed"
@@ -61,8 +87,8 @@ def _resolve_rerank_k_with_reason(
6187
return AUTO_RERANK_DEEP_K, "filter"
6288
if title_ranks:
6389
return AUTO_RERANK_FAST_K, "title"
64-
if _contains_cjk(query):
65-
return AUTO_RERANK_DEEP_K, "cjk"
90+
if _contains_cjk(query) and _contains_technical_rerank_term(query):
91+
return AUTO_RERANK_DEEP_K, "cjk_technical"
6692
return AUTO_RERANK_FAST_K, "default"
6793

6894

tests/test_search.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,15 @@ def test_title_match_uses_fast_budget(self):
6262
title_ranks={10: 1},
6363
) == 5
6464

65-
def test_cjk_without_title_match_uses_deep_budget(self):
65+
def test_cjk_without_title_match_uses_fast_budget(self):
66+
assert _resolve_rerank_k(
67+
"学完就忘怎么办",
68+
"auto",
69+
has_filter=False,
70+
title_ranks={},
71+
) == 5
72+
73+
def test_cjk_technical_query_uses_deep_budget(self):
6674
assert _resolve_rerank_k(
6775
"把文档切块放进向量库",
6876
"auto",

0 commit comments

Comments
 (0)