fix: stop Refine loop early on satisfied structured answer; apply retriever_weights in RRF by rmotgi1227 · Pull Request #21715 · run-llama/llama_index

rmotgi1227 · 2026-05-19T17:59:27Z

Two retrieval-correctness fixes in core library code.

Fix 1 — `Refine` ignores `query_satisfied=True` (fixes #21397)

File: llama-index-core/llama_index/core/response_synthesizers/refine.py

When structured_answer_filtering=True and the LLM returns a satisfactory answer, _run_refine_loop / _arun_refine_loop continued iterating over every remaining chunk — making redundant LLM calls and potentially overwriting the already-satisfied answer with a worse one.

Add break after updating response when self._structured_answer_filtering is set, in both the sync and async loops:

if resp := self._update_response(program, prompt_kwargs, response_kwargs):
    response = resp
    if self._structured_answer_filtering:
        break   # ← added

Non-filtering mode is unchanged — the loop still processes all chunks as designed.

Fix 2 — `QueryFusionRetriever` ignores `retriever_weights` in `reciprocal_rerank` mode (fixes #21444)

File: llama-index-core/llama_index/core/retrievers/fusion_retriever.py

_reciprocal_rerank_fusion called results.values(), discarding the (query_str, retriever_idx) dict key, so self._retriever_weights was silently ignored. Every retriever contributed equally regardless of the weights passed by the user.

Switch to results.items(), unpack retriever_idx from the key, and multiply the RRF score by the corresponding weight — exactly mirroring the pattern already used in _relative_score_fusion:

# BEFORE
for nodes_with_scores in results.values():
    ...
    fused_scores[hash] += 1.0 / (rank + k)

# AFTER
for query_tuple, nodes_with_scores in results.items():
    retriever_idx = query_tuple[1]
    weight = self._retriever_weights[retriever_idx]
    ...
    fused_scores[hash] += weight / (rank + k)

The default weight of 1.0 / len(retrievers) means existing code with no explicit weights behaves identically (equal contribution from each retriever).

…s query; apply retriever_weights in RRF fusion Two retrieval-correctness bugs in the same PR: 1. refine.py (fixes run-llama#21397): when structured_answer_filtering=True and the LLM returns a satisfactory answer, the while-chunks loop kept running and making redundant LLM calls. Add a break in both sync and async paths so iteration stops as soon as the structured program signals satisfaction. 2. fusion_retriever.py (fixes run-llama#21444): _reciprocal_rerank_fusion iterated results.values(), discarding the retriever index stored in the dict key, so self._retriever_weights was always ignored. Switch to results.items(), unpack the retriever index from the key, and multiply the RRF score by the corresponding weight — mirroring _relative_score_fusion exactly.

logan-markewich · 2026-05-20T21:19:47Z

Looks like this breaks a test in core, please take a look

gautamvarmadatla · 2026-05-21T13:36:37Z

hello @logan-markewich,

I just wanted to mention that this seems to overlap with the two PRs I had opened earlier (#21445 and #21398). I noticed a couple of small differences that might be worth considering:

For the RRF change, this PR updates the scoring formula, while fix(core): retriever_weights not applied in QueryFusionRetriever reciprocal_rerank mode #21445 also avoids the score mutation side effect by returning a fresh NodeWithScore with the fused score.
For the refine change, fix(core): short-circuit Refine on query_satisfied when structured_answer_filtering is enabled #21398 also handles the streaming case before short-circuiting, which helps avoid returning a Generator / AsyncGenerator where response text may be expected downstream.

Please let me know what you think would be best here. I’m very happy to close mine, update them, or help consolidate in whatever way is most helpful.

rmotgi1227 · 2026-05-21T19:38:28Z

Thanks for the review @logan-markewich, and thanks @PanasonicConnect for flagging the overlap.

Looking at the failing test, the assertion count == len(nodes) reveals that the correct fix needs to handle the streaming/generator case and avoid mutating NodeWithScore scores in-place — both covered better in #21398 and #21445. Closing this in favour of those PRs.

dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 19, 2026

logan-markewich approved these changes May 20, 2026

View reviewed changes

dosubot Bot added the lgtm This PR has been approved by a maintainer label May 20, 2026

rmotgi1227 closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stop Refine loop early on satisfied structured answer; apply retriever_weights in RRF#21715

fix: stop Refine loop early on satisfied structured answer; apply retriever_weights in RRF#21715
rmotgi1227 wants to merge 1 commit into
run-llama:mainfrom
rmotgi1227:fix/refine-structured-answer-early-break

rmotgi1227 commented May 19, 2026

Uh oh!

logan-markewich commented May 20, 2026

Uh oh!

gautamvarmadatla commented May 21, 2026

Uh oh!

rmotgi1227 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

rmotgi1227 commented May 19, 2026

Fix 1 — Refine ignores query_satisfied=True (fixes #21397)

Fix 2 — QueryFusionRetriever ignores retriever_weights in reciprocal_rerank mode (fixes #21444)

Uh oh!

logan-markewich commented May 20, 2026

Uh oh!

gautamvarmadatla commented May 21, 2026

Uh oh!

rmotgi1227 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix 1 — `Refine` ignores `query_satisfied=True` (fixes #21397)

Fix 2 — `QueryFusionRetriever` ignores `retriever_weights` in `reciprocal_rerank` mode (fixes #21444)