Skip to content

Commit 18bedaf

Browse files
wehosHongzhi Wenclaude
authored
fix(memory): hybrid_recall log 把 passed 数挂在 thresh 字段误导调参 (#1413)
`(>thresh %d)` 看着是阈值,实际填的是 len(bm25_top) / len(cosine_top) —— 过 阈值后剩下的数量。线上看到 `scored bm25=0 (>thresh 0)` 会被读成"阈值就是 0 所以全过",实际语义是"打分后只剩 0 篇,过阈值的也是 0 篇",阈值常量 HYBRID_RECALL_BM25_THRESHOLD=0.1 完全没出现在 log 里。 拆成 `passed` + `thresh` 两段,过阈值数和阈值常量都显式标出来: scored bm25=72 (passed 5, thresh=0.10) emb=43 (passed 4, thresh=4.00) Co-authored-by: Hongzhi Wen <cartabio.coder1@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9fe4b2d commit 18bedaf

1 file changed

Lines changed: 8 additions & 4 deletions

File tree

memory/hybrid_recall.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -545,13 +545,17 @@ async def hybrid_recall(
545545

546546
elapsed_ms = (time.time() - start) * 1000.0
547547
# union pool size for observability — bm25 pool is the superset.
548+
# `passed` = items surviving the per-side threshold; `thresh` is the
549+
# cutoff constant. 历史上这条 log 把 `passed` 数挂在 `(>thresh %d)`
550+
# 字段里,被读成"阈值=N"误导调参,所以拆成 passed + thresh 两段。
548551
logger.info(
549-
"[hybrid_recall] %s: pool bm25=%d emb=%d | scored bm25=%d (>thresh %d) "
550-
"emb=%d (>thresh %d) | fused=%d | %.0fms",
552+
"[hybrid_recall] %s: pool bm25=%d emb=%d | "
553+
"scored bm25=%d (passed %d, thresh=%.2f) "
554+
"emb=%d (passed %d, thresh=%.2f) | fused=%d | %.0fms",
551555
lanlan_name,
552556
len(bm25_pool), len(embedding_pool),
553-
len(bm25_scored), len(bm25_top),
554-
len(cosine_scored), len(cosine_top),
557+
len(bm25_scored), len(bm25_top), HYBRID_RECALL_BM25_THRESHOLD,
558+
len(cosine_scored), len(cosine_top), HYBRID_RECALL_COSINE_THRESHOLD,
555559
len(results), elapsed_ms,
556560
)
557561
return {

0 commit comments

Comments
 (0)