Skip to content

bug: [chroma] similarity_search_by_vector_with_relevance_scores returns raw distances instead of normalized scores #38506

Description

Submission checklist

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

from langchain_core.embeddings import FakeEmbeddings
from langchain_chroma import Chroma

texts = ["foo", "bar", "baz"]
# Under L2 distance (the default space in Chroma):
# Perfect match distance is 0.0, relevance score should normalize to 1.0 (1.0 - distance/sqrt(2))
docsearch = Chroma.from_texts(
    texts,
    FakeEmbeddings(),
    collection_name="test_collection",
)

embedded_query = FakeEmbeddings().embed_query("foo")
results = docsearch.similarity_search_by_vector_with_relevance_scores(embedded_query, k=1)
print("Returned relevance scores:")
for doc, score in results:
    print(f"Content: '{doc.page_content}', Relevance Score: {score}")

Error Message and Stack Trace (if applicable)

Description

Problem

When calling similarity_search_by_vector_with_relevance_scores (or any method relying on it like similarity_search_by_image_with_relevance_score) in the Chroma vector store, the returned scores are raw distances rather than normalized relevance scores in the [0, 1] range. For instance, an exact match with the default L2 distance space returns 0.0 (raw distance) instead of 1.0 (normalized relevance score).

Root cause

  1. In libs/partners/chroma/langchain_chroma/vectorstores.py, the similarity_search_by_vector_with_relevance_scores method directly returns the output of _results_to_docs_and_scores(results). This bypassed the relevance normalization function returned by self._select_relevance_score_fn().
  2. Additionally, the base class's _max_inner_product_relevance_score_fn assumes a distance definition of -inner_product. Since Chroma defines IP space distance as 1.0 - inner_product, calling it with a perfect distance of 0.0 falls into the distance <= 0 branch and incorrectly returns a relevance score of 0.0 instead of 1.0.

Suggested fix

  1. Update similarity_search_by_vector_with_relevance_scores to apply the normalization function retrieved by _select_relevance_score_fn().
  2. Override _max_inner_product_relevance_score_fn in the Chroma class to correctly return 1.0 - distance to align with Chroma's IP distance space convention.

(Note: Downstream consumers expecting raw distances where lower is more similar will experience inverted semantics, as the method now correctly returns normalized relevance scores where higher is more similar.)

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Thu May 21 18:06:59 UTC 2026
Python Version: 3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 16:09:02) [GCC 11.2.0]

Package Information

langchain_core: 1.4.8
langchain_chroma: 1.1.0
chromadb: 1.5.9
langsmith: 0.8.18
langchain_protocol: 0.0.17
langchain_tests: 1.1.9

Other Dependencies

httpx: 0.28.1
jsonpatch: 1.33
numpy: 2.3.3
orjson: 3.11.6
packaging: 25.0
pydantic: 2.12.5
pytest: 9.1.0
pytest-asyncio: 1.3.0
pytest-benchmark: 5.0.1
pytest-codspeed: 4.0.0
pytest-recording: 0.13.4
pytest-socket: 0.7.0
pyyaml: 6.0.3
requests: 2.33.0
requests-toolbelt: 1.0.0
rich: 14.1.0
syrupy: 5.1.0
tenacity: 9.1.2
typing-extensions: 4.15.0
uuid-utils: 0.12.0
vcrpy: 8.2.1
websockets: 15.0.1
wrapt: 1.17.3
xxhash: 3.6.0
zstandard: 0.25.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing featurechroma`langchain-chroma` package issues & PRsexternal

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions