fix: improve memory search with token AND matching and rank-based scoring by yiwang · Pull Request #10 · localgpt-app/localgpt

yiwang · 2026-02-08T16:49:10Z

Replace phrase-match FTS queries with tokenized AND-joined queries so
searches like "database concurrency locks" match chunks containing all
three words in any order, not just as an exact consecutive phrase.

Replace max-normalization scoring in hybrid search with rank-based
scoring (1/(1+rank)) for both FTS and vector results, preventing
pathological cases where a single weak match dominates strong results.

https://claude.ai/code/session_0193Gh5J9fEu65cyKyzUp5vT

…ring Replace phrase-match FTS queries with tokenized AND-joined queries so searches like "database concurrency locks" match chunks containing all three words in any order, not just as an exact consecutive phrase. Replace max-normalization scoring in hybrid search with rank-based scoring (1/(1+rank)) for both FTS and vector results, preventing pathological cases where a single weak match dominates strong results. https://claude.ai/code/session_0193Gh5J9fEu65cyKyzUp5vT

Copilot

Pull request overview

This PR updates the memory index search behavior to improve recall and make hybrid ranking more stable by switching FTS queries from phrase matching to tokenized AND matching, and by replacing score normalization with rank-based scoring.

Changes:

Replace phrase-quoted FTS queries with tokenized queries joined by AND (all terms required, any order).
Update hybrid search merge scoring to use rank-based scoring (1/(1+rank)) for both FTS and vector result sets.
Replace escape_fts_query() with build_fts_query() for constructing FTS5 MATCH expressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-08T16:52:32Z

+        // Add FTS results using rank-based scoring (OpenClaw-compatible)
+        // BM25 results are already ordered by relevance (best first)
+        for (rank, result) in fts_results.into_iter().enumerate() {
            let key = format!("{}:{}:{}", result.file, result.line_start, result.line_end);
-            let normalized_score = (result.score / max_fts_score) as f32;
-            let weighted_score = normalized_score * text_weight;
+            let rank_score = 1.0 / (1.0 + rank as f32); // rank 0 → 1.0, rank 1 → 0.5, rank 9 → 0.1
+            let weighted_score = rank_score * text_weight;
            merged.insert(key, (weighted_score, result));
        }


In search_hybrid, the MemoryChunk.score returned for FTS-only results remains the raw BM25-derived value from search(), while vector-only / merged results get the combined weighted score. This makes score inconsistent and user-facing outputs (CLI/HTTP/UI) misleading. Consider setting result.score to the computed weighted_score before inserting into merged (and using the same convention for all paths) so every returned chunk’s score reflects the final combined ranking value.

Copilot · 2026-02-08T16:52:33Z

+/// Build FTS5 query from raw input (OpenClaw-compatible)
+/// Tokenizes input and joins with AND so all terms must appear (in any order)
+fn build_fts_query(raw: &str) -> Option<String> {
+    let tokens: Vec<&str> = raw
+        .split(|c: char| !c.is_alphanumeric() && c != '_')
+        .map(|t| t.trim())
+        .filter(|t| !t.is_empty())
+        .collect();
+
+    if tokens.is_empty() {
+        return None;
+    }
+
+    // Quote each token individually, join with AND
+    let quoted: Vec<String> = tokens
+        .iter()
+        .map(|t| format!("\"{}\"", t.replace('"', "")))
+        .collect();
+
+    Some(quoted.join(" AND "))
 }


build_fts_query() introduces new tokenization/AND semantics (and the empty-query early return) but there are no unit tests covering these behaviors (e.g., token order independence, punctuation handling, and the empty-input case). Since this file already has tests for indexing/search, adding targeted tests here would help prevent regressions in query construction and matching behavior.

yiwang requested a review from Copilot February 8, 2026 16:49

Copilot started reviewing on behalf of yiwang February 8, 2026 16:49 View session

Copilot AI reviewed Feb 8, 2026

View reviewed changes

yiwang merged commit 01eb36d into main Feb 8, 2026
7 of 11 checks passed

yiwang deleted the claude/fix-memory-search-15HIB branch February 8, 2026 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve memory search with token AND matching and rank-based scoring#10

fix: improve memory search with token AND matching and rank-based scoring#10
yiwang merged 1 commit into
mainfrom
claude/fix-memory-search-15HIB

yiwang commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yiwang commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants