Skip to content

Conversation

@arpon-kapuria
Copy link

Description:

This PR fixes an inconsistency in similarity_search_with_relevance_scores for MAX_INNER_PRODUCT metrics.
Currently, the _max_inner_product_relevance_score_fn inverts raw MAX_INNER_PRODUCT similarity scores using 1.0 - distance. This is correct for distance-based metrics (like Euclidean or cosine distances), but for similarity-based metrics (like MAX_INNER_PRODUCT), higher values are already better. Applying 1.0 - similarity flips the score, which causes inconsistent behavior when filtering with score_threshold in similarity_search_with_relevance_scores.

Changes in this PR:

  1. Returns the raw similarity for MAX_INNER_PRODUCT metrics (clamped to 0–1 if necessary).
  2. Ensures similarity >= score_threshold works naturally for similarity metrics.
  3. Maintains the existing logic for distance-based metrics.

Behavior before fix:

  • MAX_INNER_PRODUCT raw scores were inverted → higher similarity could become lower relevance → documents could be wrongly filtered.

Behavior after fix:

  • MAX_INNER_PRODUCT raw scores are returned as-is → score_threshold filtering behaves consistently.

Issue:
Fixes #32045

Dependencies:
No additional dependencies.

@github-actions github-actions bot added core Related to the package `langchain-core` fix labels Nov 1, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 1, 2025

CodSpeed Performance Report

Merging #33776 will not alter performance

Comparing arpon-kapuria:32045 (b873c53) with master (81c4f21)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 13 untouched
⏩ 21 skipped1

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@arpon-kapuria arpon-kapuria changed the title fix(vectorstores): preserve raw similarity in _max_inner_product_relevance_score_fn #32045 fix(vectorstores): preserve raw similarity in _max_inner_product_relevance_score_fn Nov 1, 2025
@github-actions github-actions bot added fix and removed fix labels Nov 1, 2025
@arpon-kapuria arpon-kapuria changed the title fix(vectorstores): preserve raw similarity in _max_inner_product_relevance_score_fn fix(core): preserve raw similarity in _max_inner_product_relevance_score_fn Nov 1, 2025
@github-actions github-actions bot added fix and removed fix labels Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Related to the package `langchain-core` fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

choosing score_threshold in similarity_search_with_relevance_scores and faiss storage with distance==DistanceStrategy.MAX_INNER_PRODUCT

2 participants