`prepare_response()` returns incorrect top result from `similarity_search_by_vector`
Bug Description
When running `similarity_search_by_vector` and passing in the relevant data, the response returns all rows with their associated similarity scores.
After reviewing the returned objects:
-
The highest-scoring match corresponds in this case to Image 3 with a score of 0.439.
-
Sorting manually using:
```python
response.similar_objects.sort('score', False).head(1)
```
correctly shows Image 3 at the top.
However, when calling `prepare_response()`, the method returns Image 4, which has the second-highest score (0.3547).
This suggests that the LLM may not be selecting the top-scoring object.
Expected Behavior
`prepare_response()` should return the object with the highest similarity score, which in this case is Image 3.
Steps to Reproduce
The relevant demo for reproduction is at:
https://github.com/Teradata/jupyter-demos/tree/multimodal_evs_unstructured_embeddings/VantageCloud_Lake/UseCases/Multimodal_Agentic_Semantic_Search
This happens in this demo but we believe is a general behavior.
-
Run `similarity_search_by_vector` using the first query image row.
-
Inspect the returned `response.similar_objects` and note that Image 3 has the highest score.
-
Sort manually:
```python
response.similar_objects.sort('score', False)
```
and confirm Image 3 appears first.
-
Call:
```python
response.prepare_response()
```
and observe that Image 4 is returned instead of Image 3.
Questions / Hypotheses
- Is `prepare_response()` using the similarity score ranking correctly?
- Is the LLM receiving the entire ranked list, or could results be truncated before selection?
- Is the model basing its selection on semantic content instead of the numeric score?
Additional Context
- Image 4’s similarity score of 0.3547 is lower than Image 3’s score of 0.439.
- Manual sorting confirms the vector search results are correct.
- The discrepancy appears isolated to the `prepare_response()` step.
- This happens in this demo
`prepare_response()` returns incorrect top result from `similarity_search_by_vector`
Bug Description
When running `similarity_search_by_vector` and passing in the relevant data, the response returns all rows with their associated similarity scores.
After reviewing the returned objects:
The highest-scoring match corresponds in this case to Image 3 with a score of 0.439.
Sorting manually using:
```python
response.similar_objects.sort('score', False).head(1)
```
correctly shows Image 3 at the top.
However, when calling `prepare_response()`, the method returns Image 4, which has the second-highest score (0.3547).
This suggests that the LLM may not be selecting the top-scoring object.
Expected Behavior
`prepare_response()` should return the object with the highest similarity score, which in this case is Image 3.
Steps to Reproduce
The relevant demo for reproduction is at:
https://github.com/Teradata/jupyter-demos/tree/multimodal_evs_unstructured_embeddings/VantageCloud_Lake/UseCases/Multimodal_Agentic_Semantic_Search
This happens in this demo but we believe is a general behavior.
Run `similarity_search_by_vector` using the first query image row.
Inspect the returned `response.similar_objects` and note that Image 3 has the highest score.
Sort manually:
```python
response.similar_objects.sort('score', False)
```
and confirm Image 3 appears first.
Call:
```python
response.prepare_response()
```
and observe that Image 4 is returned instead of Image 3.
Questions / Hypotheses
Additional Context