fix: stop Refine loop early on satisfied structured answer; apply retriever_weights in RRF#21715
Conversation
…s query; apply retriever_weights in RRF fusion Two retrieval-correctness bugs in the same PR: 1. refine.py (fixes run-llama#21397): when structured_answer_filtering=True and the LLM returns a satisfactory answer, the while-chunks loop kept running and making redundant LLM calls. Add a break in both sync and async paths so iteration stops as soon as the structured program signals satisfaction. 2. fusion_retriever.py (fixes run-llama#21444): _reciprocal_rerank_fusion iterated results.values(), discarding the retriever index stored in the dict key, so self._retriever_weights was always ignored. Switch to results.items(), unpack the retriever index from the key, and multiply the RRF score by the corresponding weight — mirroring _relative_score_fusion exactly.
|
Looks like this breaks a test in core, please take a look |
|
hello @logan-markewich, I just wanted to mention that this seems to overlap with the two PRs I had opened earlier (#21445 and #21398). I noticed a couple of small differences that might be worth considering:
Please let me know what you think would be best here. I’m very happy to close mine, update them, or help consolidate in whatever way is most helpful. |
|
Thanks for the review @logan-markewich, and thanks @PanasonicConnect for flagging the overlap. Looking at the failing test, the assertion |
Two retrieval-correctness fixes in core library code.
Fix 1 —
Refineignoresquery_satisfied=True(fixes #21397)File:
llama-index-core/llama_index/core/response_synthesizers/refine.pyWhen
structured_answer_filtering=Trueand the LLM returns a satisfactory answer,_run_refine_loop/_arun_refine_loopcontinued iterating over every remaining chunk — making redundant LLM calls and potentially overwriting the already-satisfied answer with a worse one.Add
breakafter updatingresponsewhenself._structured_answer_filteringis set, in both the sync and async loops:Non-filtering mode is unchanged — the loop still processes all chunks as designed.
Fix 2 —
QueryFusionRetrieverignoresretriever_weightsinreciprocal_rerankmode (fixes #21444)File:
llama-index-core/llama_index/core/retrievers/fusion_retriever.py_reciprocal_rerank_fusioncalledresults.values(), discarding the(query_str, retriever_idx)dict key, soself._retriever_weightswas silently ignored. Every retriever contributed equally regardless of the weights passed by the user.Switch to
results.items(), unpackretriever_idxfrom the key, and multiply the RRF score by the corresponding weight — exactly mirroring the pattern already used in_relative_score_fusion:The default weight of
1.0 / len(retrievers)means existing code with no explicit weights behaves identically (equal contribution from each retriever).