fix(pgvector): convert cosine distance to similarity score by Kblack0610 · Pull Request #4994 · mem0ai/mem0

Kblack0610 · 2026-04-28T03:40:55Z

Summary

PGVector.search() returns the raw vector <=> %s::vector distance as OutputData.score, but score_and_rank() in mem0/utils/scoring.py (used by Memory.search() for hybrid retrieval) treats score as a similarity (higher = better) and sorts reverse=True. The result: semantically closest memories rank LAST while least-similar memories surface first.

This is the Python-side equivalent of the same bug fixed in the TypeScript SDK in #4944, which described the symptom identically:

While doing the semantic search, most similar documents have to be ranked highest. Using PGVector, the scores were inverted, surfacing the least relevant documents.

Reproduction

Set up mem0 with the pgvector backend, add a handful of memories, then query for the exact text of one of them. Without this fix, the exact-match memory ranks dead last with a low score; the highest-scoring result is whichever memory has the largest cosine distance from the query.

In a small self-hosted deployment (19 memories), before the fix:

query: "plan mode default for non-trivial tasks"
0.553 | User runs a homelab cluster called home-k3s on bare metal       ← top
...
0.221 | Kenneth defaults to plan mode for non-trivial tasks involving... ← LAST (exact match)

After the fix:

query: "plan mode default for non-trivial tasks"
0.779 | Kenneth defaults to plan mode for non-trivial tasks involving... ← top (exact match)
0.588 | Kenneth scopes his own next steps and does NOT want optional...
0.563 | Kenneth prefers automation over manual fixes...

The same pattern reproduces across queries — exact-text matches for kubectl context home cluster, shell neovim notes, and postgres fsGroup capabilities all moved from bottom-of-stack to top-of-stack after the fix.

Direct postgres confirmation that the embeddings themselves are fine:

-- ORDER BY vector <=> $query_emb (ASC = lowest distance first) returns
-- the exact-match row first with distance ~0.22, similarity ~0.78.
SELECT payload->>'data', vector <=> $1::vector AS dist FROM mem0_memories ORDER BY 2 LIMIT 5;

So the cause is purely the score convention mismatch between pgvector.py:251 and score_and_rank.

Fix

Convert distance → similarity at the boundary in PGVector.search():

return [OutputData(id=str(r[0]), score=float(1.0 - r[1]), payload=r[2]) for r in results]

This matches the convention every other ranker call site assumes and aligns with the TS fix in #4944.

keyword_search() is unchanged — it already returns ts_rank_cd which is a similarity score.

Type of Change

Bug fix (non-breaking change that fixes an issue)

Tests

Updated the four mocked-cursor assertions in tests/vector_stores/test_pgvector.py to expect the converted score (mocked distance 0.1 → score 0.9, etc.) and switched to assertAlmostEqual since the value is now computed:

$ python -m pytest tests/vector_stores/test_pgvector.py
============================== 50 passed in 0.50s ==============================

Linked Issue / PR

TypeScript equivalent: (TS) Fix PGVector implementation, where vector distance was inverted. #4944 (already merged)

`PGVector.search()` was returning the raw `vector <=> %s::vector` distance as `OutputData.score`, but `score_and_rank()` in the hybrid retrieval pipeline (mem0/utils/scoring.py) treats `score` as a similarity (higher = better) and sorts `reverse=True`. The result is that semantically closest memories rank LAST in `Memory.search()` output, while least-similar memories surface first — easily reproduced with any small set of memories where one matches the query exactly. Convert distance → similarity at the boundary (`1.0 - r[1]`) so the score returned matches the convention every other vector store backend in this repo already uses (e.g., Chroma's distances field is also documented as such in OutputData but ranking-time inversion happens elsewhere; pgvector was the outlier that fed raw distance into the ranker). Mirrors the equivalent TypeScript SDK fix in mem0ai#4944, which described the same symptom on the JS side ("most similar documents have to be ranked highest [...] the scores were inverted, surfacing the least relevant documents"). Updated the four mocked-cursor assertions in `tests/vector_stores/test_pgvector.py` to expect the converted score (mocked distance 0.1 → score 0.9, etc.) and switched to `assertAlmostEqual` since the value is now computed. Verified end-to-end in a self-hosted deployment: before the fix, an exact-text-match memory for "plan mode default for non-trivial tasks" scored 0.221 and ranked dead last out of 19 candidates; after the fix, it ranks first at 0.779 with a meaningful gap to the next result.

CLAassistant · 2026-04-28T03:41:01Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Opened the Python equivalent of the merged TS fix (#4944) at mem0ai/mem0#4994 — update both the README patches table and the deployment.yaml comment so the stopgap can be dropped without re-research once #4994 merges and we bump the image.

kartik-mem0 · 2026-04-28T04:27:25Z

Please sign the cla and remove the comments for consistency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pgvector): convert cosine distance to similarity score#4994

fix(pgvector): convert cosine distance to similarity score#4994
Kblack0610 wants to merge 1 commit intomem0ai:mainfrom
Kblack0610:fix/pgvector-distance-to-similarity

Kblack0610 commented Apr 28, 2026

Uh oh!

CLAassistant commented Apr 28, 2026

Uh oh!

kartik-mem0 commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Kblack0610 commented Apr 28, 2026

Summary

Reproduction

Fix

Type of Change

Tests

Linked Issue / PR

Uh oh!

CLAassistant commented Apr 28, 2026

Uh oh!

kartik-mem0 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kartik-mem0 commented Apr 28, 2026 •

edited

Loading