Skip to content

fix: improve memory search with token AND matching and rank-based scoring#10

Merged
yiwang merged 1 commit into
mainfrom
claude/fix-memory-search-15HIB
Feb 8, 2026
Merged

fix: improve memory search with token AND matching and rank-based scoring#10
yiwang merged 1 commit into
mainfrom
claude/fix-memory-search-15HIB

Conversation

@yiwang
Copy link
Copy Markdown
Contributor

@yiwang yiwang commented Feb 8, 2026

Replace phrase-match FTS queries with tokenized AND-joined queries so
searches like "database concurrency locks" match chunks containing all
three words in any order, not just as an exact consecutive phrase.

Replace max-normalization scoring in hybrid search with rank-based
scoring (1/(1+rank)) for both FTS and vector results, preventing
pathological cases where a single weak match dominates strong results.

https://claude.ai/code/session_0193Gh5J9fEu65cyKyzUp5vT

…ring

Replace phrase-match FTS queries with tokenized AND-joined queries so
searches like "database concurrency locks" match chunks containing all
three words in any order, not just as an exact consecutive phrase.

Replace max-normalization scoring in hybrid search with rank-based
scoring (1/(1+rank)) for both FTS and vector results, preventing
pathological cases where a single weak match dominates strong results.

https://claude.ai/code/session_0193Gh5J9fEu65cyKyzUp5vT
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the memory index search behavior to improve recall and make hybrid ranking more stable by switching FTS queries from phrase matching to tokenized AND matching, and by replacing score normalization with rank-based scoring.

Changes:

  • Replace phrase-quoted FTS queries with tokenized queries joined by AND (all terms required, any order).
  • Update hybrid search merge scoring to use rank-based scoring (1/(1+rank)) for both FTS and vector result sets.
  • Replace escape_fts_query() with build_fts_query() for constructing FTS5 MATCH expressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/memory/index.rs
Comment on lines +879 to 886
// Add FTS results using rank-based scoring (OpenClaw-compatible)
// BM25 results are already ordered by relevance (best first)
for (rank, result) in fts_results.into_iter().enumerate() {
let key = format!("{}:{}:{}", result.file, result.line_start, result.line_end);
let normalized_score = (result.score / max_fts_score) as f32;
let weighted_score = normalized_score * text_weight;
let rank_score = 1.0 / (1.0 + rank as f32); // rank 0 → 1.0, rank 1 → 0.5, rank 9 → 0.1
let weighted_score = rank_score * text_weight;
merged.insert(key, (weighted_score, result));
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In search_hybrid, the MemoryChunk.score returned for FTS-only results remains the raw BM25-derived value from search(), while vector-only / merged results get the combined weighted score. This makes score inconsistent and user-facing outputs (CLI/HTTP/UI) misleading. Consider setting result.score to the computed weighted_score before inserting into merged (and using the same convention for all paths) so every returned chunk’s score reflects the final combined ranking value.

Copilot uses AI. Check for mistakes.
Comment thread src/memory/index.rs
Comment on lines +947 to 967
/// Build FTS5 query from raw input (OpenClaw-compatible)
/// Tokenizes input and joins with AND so all terms must appear (in any order)
fn build_fts_query(raw: &str) -> Option<String> {
let tokens: Vec<&str> = raw
.split(|c: char| !c.is_alphanumeric() && c != '_')
.map(|t| t.trim())
.filter(|t| !t.is_empty())
.collect();

if tokens.is_empty() {
return None;
}

// Quote each token individually, join with AND
let quoted: Vec<String> = tokens
.iter()
.map(|t| format!("\"{}\"", t.replace('"', "")))
.collect();

Some(quoted.join(" AND "))
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_fts_query() introduces new tokenization/AND semantics (and the empty-query early return) but there are no unit tests covering these behaviors (e.g., token order independence, punctuation handling, and the empty-input case). Since this file already has tests for indexing/search, adding targeted tests here would help prevent regressions in query construction and matching behavior.

Copilot uses AI. Check for mistakes.
@yiwang yiwang merged commit 01eb36d into main Feb 8, 2026
7 of 11 checks passed
@yiwang yiwang deleted the claude/fix-memory-search-15HIB branch February 8, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants