fix(core): prevent output corruption in RunnableRetry.batch when partial retries succeed#35683
Conversation
…ial retries succeed After retries, the final assembly used result.pop(0) to fill positions not in results_map. But result still contained successfully-retried values alongside exceptions, so the pop consumed the wrong elements — replacing exceptions with stale success values. Replace the pop-based assembly with an index-mapped lookup using last_remaining_indices so each original position maps to its correct result from the last retry batch. Fixes langchain-ai#35475
Merging this PR will not alter performance
|
79665d4 to
231ad03
Compare
|
Friendly ping — CI is green, tests pass, rebased on latest. Ready for review whenever convenient. Happy to address any feedback. 🙏 |
✅ Verified with real OpenAI batch callEnvironment: Python 3.13.12, macOS, from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4.1-mini", max_retries=2)
inputs = [
[HumanMessage(content="Reply with ONLY the number 1")],
[HumanMessage(content="Reply with ONLY the number 2")],
[HumanMessage(content="Reply with ONLY the number 3")],
]
results = llm.batch(inputs)
for i, r in enumerate(results):
print(f"Input {i+1} -> {r.content.strip()}")Batch with |
|
ccurme (@ccurme) This fixes a batch output corruption bug in |
Ethan T. (gambletan)
left a comment
There was a problem hiding this comment.
Solid fix for a subtle index-mapping bug. The root cause analysis is clear: when result.pop(0) was used to distribute retry results, it didn't account for the fact that result only contains entries for the remaining (retried) indices, not all original indices. Using dict(zip(last_remaining_indices, result, strict=True)) correctly maps each retry result back to its original position.
A couple of notes:
-
strict=Trueinzipis a good safety net: Iflast_remaining_indicesandresultever have mismatched lengths, this will raiseValueErrorimmediately rather than silently producing wrong output. Nice defensive choice. -
Edge case — all items succeed on first try: When all items succeed,
remaining_indicesbecomes empty and webreakbefore updatinglast_remaining_indices. In that caselast_remaining_indicesstays aslist(range(len(inputs)))(the initial value), butresultremainsnot_set(the sentinel). The code after the try/except only enters theelsebranch (last_result_map[idx]) whenidx not in results_map, so this path would only be reached if there's an idx not already inresults_map. Since all items succeeded, they'd all be inresults_map, so thelast_result_mapis never consulted. This is correct but somewhat subtle — a brief comment noting this invariant might help future readers. -
Test coverage is excellent: Both sync and async tests with the three-way scenario (immediate success, transient failure, permanent failure) are exactly the right regression tests. The
failed_onceflag pattern is clean. -
Both
_batchand_abatchare updated symmetrically — good, no risk of the async path being missed.
LGTM overall.
Clarify why last_result_map is only consulted for indices that still need fallback values after retries.
|
Thanks — I pushed a small follow-up commit ( I also reran targeted lint, the regression tests, and a direct runtime reproduction for the partial-retry scenario after the change. |
|
Friendly ping — rebased on latest and ready for review. Happy to address any feedback! |
|
Thanks for the thorough review Ethan T. (@gambletan)! Great observation on point #2 — you're right that the invariant is subtle. I'll add a brief inline comment to make it explicit for future readers. The edge case is indeed safe: when all items succeed on the first try, |
Add detailed comment explaining why last_result_map is safe when every item succeeds on the first attempt, as suggested by @gambletan in review.
Alvin Tang (alvinttang)
left a comment
There was a problem hiding this comment.
Review: Prevent output corruption in RunnableRetry.batch when partial retries succeed
Excellent bug analysis
The root cause is well-identified: result.pop(0) consumed elements sequentially from the last retry's output list, but that list only contains results for the retried indices — not all original indices. When results_map already consumed some of those results (successfully retried items), pop(0) shifts remaining elements to wrong positions.
Fix correctness
The fix replaces pop(0) with an index-mapped lookup via dict(zip(last_remaining_indices, result, strict=True)). This is correct:
last_remaining_indicestracks which original indices were attempted in the final retry roundresultcontains outcomes in the same order aslast_remaining_indices- The
strict=Trueinzipis a nice safety net — it will raiseValueErrorif the two lists have different lengths, catching any future logic errors
The same fix is applied symmetrically to both _batch and _abatch. Good.
Edge case analysis
All succeed on first attempt: remaining_indices becomes empty on the second iteration, hitting the break before last_remaining_indices is updated. So last_remaining_indices == range(len(inputs)) and result contains the full first-attempt output. All indices land in results_map, so last_result_map is never consulted. Correct.
All fail permanently: results_map stays empty. last_remaining_indices == range(len(inputs)) after the final attempt. result is either the last retry output or the [e] * len(inputs) fallback. In the fallback case, result has len(inputs) elements and last_remaining_indices also has len(inputs) elements, so strict=True zip works. Correct.
Single item: Trivially correct — one index, one result.
Python version concern
zip(..., strict=True) was introduced in Python 3.10. LangChain core's minimum Python version should be checked — if it supports 3.9, this would be a compatibility break. Looking at recent pyproject.toml changes, langchain-core requires >=3.9. If 3.9 is still supported, this needs itertools or a manual length check instead.
Edit: Actually, checking the codebase, langchain-core's pyproject.toml specifies python = ">=3.9". The strict=True parameter for zip is only available in Python 3.10+. This would cause a TypeError on Python 3.9. This needs to be fixed before merge — either drop strict=True or add a manual assertion:
assert len(last_remaining_indices) == len(result)
last_result_map = dict(zip(last_remaining_indices, result))Tests
The regression tests are well-structured — they precisely reproduce the corruption scenario from the issue. Both sync and async variants are tested. The nonlocal failed_once pattern cleanly simulates a transient failure.
One suggestion: consider adding a test where multiple items succeed on retry at different retry rounds (e.g., item A succeeds on retry 1, item B succeeds on retry 2) to exercise the results_map accumulation across multiple iterations.
Summary
Excellent fix for a real data corruption bug. The index-mapping approach is clean and correct. The main blocker is the zip(strict=True) Python 3.9 compatibility issue — please verify the minimum supported Python version and adjust accordingly.
zip(strict=True) was introduced in Python 3.10, but langchain-core supports Python >=3.9. Remove the strict parameter — the invariant (equal-length lists) is guaranteed by the retry loop logic.
|
Alvin Tang (@alvinttang) Good catch on the |
c029ac2 to
a66a9aa
Compare
Bug
RunnableRetry.batch()/abatch()withreturn_exceptions=Truecan return corrupted outputs when some items succeed on retry while others still fail. A permanently-failing item can be silently replaced by a successfully-retried value from a different position.Root Cause
After retries exhaust, the final assembly loop uses
result.pop(0)to fill positions not yet inresults_map. Butresultstill contains all items from the last retry batch — including successfully-retried values already saved toresults_map. Thepop(0)consumes them in order, picking up the wrong element for positions that should be exceptions.Example (from the issue):
["ok", "retry_then_ok", "always_fail"]result = ["retry-result", ValueError]results_map = {0: "ok-result", 1: "retry-result"}result.pop(0)returns"retry-result"instead of theValueErrorFix
Replace the
pop(0)-based assembly with an index-mapped lookup:last_remaining_indicesacross retry iterationslast_result_map = dict(zip(last_remaining_indices, result))results_map(succeeded) orlast_result_map(still-failing)Applied identically to both
_batchand_abatch.Tests
Added sync and async regression tests that reproduce the exact scenario from the issue: one item succeeds immediately, one succeeds on retry, one always fails. Before this fix,
result[2]was"retry-result"instead of an exception.Fixes #35475