Skip to content

fix: stop Refine loop early on satisfied structured answer; apply retriever_weights in RRF#21715

Closed
rmotgi1227 wants to merge 1 commit into
run-llama:mainfrom
rmotgi1227:fix/refine-structured-answer-early-break
Closed

fix: stop Refine loop early on satisfied structured answer; apply retriever_weights in RRF#21715
rmotgi1227 wants to merge 1 commit into
run-llama:mainfrom
rmotgi1227:fix/refine-structured-answer-early-break

Conversation

@rmotgi1227

Copy link
Copy Markdown

Two retrieval-correctness fixes in core library code.


Fix 1 — Refine ignores query_satisfied=True (fixes #21397)

File: llama-index-core/llama_index/core/response_synthesizers/refine.py

When structured_answer_filtering=True and the LLM returns a satisfactory answer, _run_refine_loop / _arun_refine_loop continued iterating over every remaining chunk — making redundant LLM calls and potentially overwriting the already-satisfied answer with a worse one.

Add break after updating response when self._structured_answer_filtering is set, in both the sync and async loops:

if resp := self._update_response(program, prompt_kwargs, response_kwargs):
    response = resp
    if self._structured_answer_filtering:
        break   # ← added

Non-filtering mode is unchanged — the loop still processes all chunks as designed.


Fix 2 — QueryFusionRetriever ignores retriever_weights in reciprocal_rerank mode (fixes #21444)

File: llama-index-core/llama_index/core/retrievers/fusion_retriever.py

_reciprocal_rerank_fusion called results.values(), discarding the (query_str, retriever_idx) dict key, so self._retriever_weights was silently ignored. Every retriever contributed equally regardless of the weights passed by the user.

Switch to results.items(), unpack retriever_idx from the key, and multiply the RRF score by the corresponding weight — exactly mirroring the pattern already used in _relative_score_fusion:

# BEFORE
for nodes_with_scores in results.values():
    ...
    fused_scores[hash] += 1.0 / (rank + k)

# AFTER
for query_tuple, nodes_with_scores in results.items():
    retriever_idx = query_tuple[1]
    weight = self._retriever_weights[retriever_idx]
    ...
    fused_scores[hash] += weight / (rank + k)

The default weight of 1.0 / len(retrievers) means existing code with no explicit weights behaves identically (equal contribution from each retriever).

…s query; apply retriever_weights in RRF fusion

Two retrieval-correctness bugs in the same PR:

1. refine.py (fixes run-llama#21397): when structured_answer_filtering=True and the
   LLM returns a satisfactory answer, the while-chunks loop kept running and
   making redundant LLM calls. Add a break in both sync and async paths so
   iteration stops as soon as the structured program signals satisfaction.

2. fusion_retriever.py (fixes run-llama#21444): _reciprocal_rerank_fusion iterated
   results.values(), discarding the retriever index stored in the dict key,
   so self._retriever_weights was always ignored. Switch to results.items(),
   unpack the retriever index from the key, and multiply the RRF score by
   the corresponding weight — mirroring _relative_score_fusion exactly.
@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 19, 2026
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 20, 2026
@logan-markewich

Copy link
Copy Markdown
Collaborator

Looks like this breaks a test in core, please take a look

@gautamvarmadatla

Copy link
Copy Markdown
Contributor

hello @logan-markewich,

I just wanted to mention that this seems to overlap with the two PRs I had opened earlier (#21445 and #21398). I noticed a couple of small differences that might be worth considering:

  1. For the RRF change, this PR updates the scoring formula, while fix(core): retriever_weights not applied in QueryFusionRetriever reciprocal_rerank mode #21445 also avoids the score mutation side effect by returning a fresh NodeWithScore with the fused score.
  2. For the refine change, fix(core): short-circuit Refine on query_satisfied when structured_answer_filtering is enabled #21398 also handles the streaming case before short-circuiting, which helps avoid returning a Generator / AsyncGenerator where response text may be expected downstream.

Please let me know what you think would be best here. I’m very happy to close mine, update them, or help consolidate in whatever way is most helpful.

@rmotgi1227

Copy link
Copy Markdown
Author

Thanks for the review @logan-markewich, and thanks @PanasonicConnect for flagging the overlap.

Looking at the failing test, the assertion count == len(nodes) reveals that the correct fix needs to handle the streaming/generator case and avoid mutating NodeWithScore scores in-place — both covered better in #21398 and #21445. Closing this in favour of those PRs.

@rmotgi1227 rmotgi1227 closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

3 participants