Submission checklist
Package (Required)
Related Issues / PRs
No response
Reproduction Steps / Example Code (Python)
from langchain_core.embeddings import FakeEmbeddings
from langchain_chroma import Chroma
texts = ["foo", "bar", "baz"]
# Under L2 distance (the default space in Chroma):
# Perfect match distance is 0.0, relevance score should normalize to 1.0 (1.0 - distance/sqrt(2))
docsearch = Chroma.from_texts(
texts,
FakeEmbeddings(),
collection_name="test_collection",
)
embedded_query = FakeEmbeddings().embed_query("foo")
results = docsearch.similarity_search_by_vector_with_relevance_scores(embedded_query, k=1)
print("Returned relevance scores:")
for doc, score in results:
print(f"Content: '{doc.page_content}', Relevance Score: {score}")
Error Message and Stack Trace (if applicable)
Description
Problem
When calling similarity_search_by_vector_with_relevance_scores (or any method relying on it like similarity_search_by_image_with_relevance_score) in the Chroma vector store, the returned scores are raw distances rather than normalized relevance scores in the [0, 1] range. For instance, an exact match with the default L2 distance space returns 0.0 (raw distance) instead of 1.0 (normalized relevance score).
Root cause
- In
libs/partners/chroma/langchain_chroma/vectorstores.py, the similarity_search_by_vector_with_relevance_scores method directly returns the output of _results_to_docs_and_scores(results). This bypassed the relevance normalization function returned by self._select_relevance_score_fn().
- Additionally, the base class's
_max_inner_product_relevance_score_fn assumes a distance definition of -inner_product. Since Chroma defines IP space distance as 1.0 - inner_product, calling it with a perfect distance of 0.0 falls into the distance <= 0 branch and incorrectly returns a relevance score of 0.0 instead of 1.0.
Suggested fix
- Update
similarity_search_by_vector_with_relevance_scores to apply the normalization function retrieved by _select_relevance_score_fn().
- Override
_max_inner_product_relevance_score_fn in the Chroma class to correctly return 1.0 - distance to align with Chroma's IP distance space convention.
(Note: Downstream consumers expecting raw distances where lower is more similar will experience inverted semantics, as the method now correctly returns normalized relevance scores where higher is more similar.)
System Info
System Information
OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Thu May 21 18:06:59 UTC 2026
Python Version: 3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 16:09:02) [GCC 11.2.0]
Package Information
langchain_core: 1.4.8
langchain_chroma: 1.1.0
chromadb: 1.5.9
langsmith: 0.8.18
langchain_protocol: 0.0.17
langchain_tests: 1.1.9
Other Dependencies
httpx: 0.28.1
jsonpatch: 1.33
numpy: 2.3.3
orjson: 3.11.6
packaging: 25.0
pydantic: 2.12.5
pytest: 9.1.0
pytest-asyncio: 1.3.0
pytest-benchmark: 5.0.1
pytest-codspeed: 4.0.0
pytest-recording: 0.13.4
pytest-socket: 0.7.0
pyyaml: 6.0.3
requests: 2.33.0
requests-toolbelt: 1.0.0
rich: 14.1.0
syrupy: 5.1.0
tenacity: 9.1.2
typing-extensions: 4.15.0
uuid-utils: 0.12.0
vcrpy: 8.2.1
websockets: 15.0.1
wrapt: 1.17.3
xxhash: 3.6.0
zstandard: 0.25.0
Submission checklist
Package (Required)
Related Issues / PRs
No response
Reproduction Steps / Example Code (Python)
Error Message and Stack Trace (if applicable)
Description
Problem
When calling
similarity_search_by_vector_with_relevance_scores(or any method relying on it likesimilarity_search_by_image_with_relevance_score) in theChromavector store, the returned scores are raw distances rather than normalized relevance scores in the[0, 1]range. For instance, an exact match with the default L2 distance space returns0.0(raw distance) instead of1.0(normalized relevance score).Root cause
libs/partners/chroma/langchain_chroma/vectorstores.py, thesimilarity_search_by_vector_with_relevance_scoresmethod directly returns the output of_results_to_docs_and_scores(results). This bypassed the relevance normalization function returned byself._select_relevance_score_fn()._max_inner_product_relevance_score_fnassumes a distance definition of-inner_product. Since Chroma defines IP space distance as1.0 - inner_product, calling it with a perfect distance of0.0falls into thedistance <= 0branch and incorrectly returns a relevance score of0.0instead of1.0.Suggested fix
similarity_search_by_vector_with_relevance_scoresto apply the normalization function retrieved by_select_relevance_score_fn()._max_inner_product_relevance_score_fnin theChromaclass to correctly return1.0 - distanceto align with Chroma's IP distance space convention.(Note: Downstream consumers expecting raw distances where lower is more similar will experience inverted semantics, as the method now correctly returns normalized relevance scores where higher is more similar.)
System Info
System Information
Package Information
Other Dependencies