fix(core): prevent output corruption in RunnableRetry.batch when partial retries succeed by giulio-leone · Pull Request #35683 · langchain-ai/langchain

Giulio Leone (giulio-leone) · 2026-03-09T10:33:23Z

Bug

RunnableRetry.batch() / abatch() with return_exceptions=True can return corrupted outputs when some items succeed on retry while others still fail. A permanently-failing item can be silently replaced by a successfully-retried value from a different position.

Root Cause

After retries exhaust, the final assembly loop uses result.pop(0) to fill positions not yet in results_map. But result still contains all items from the last retry batch — including successfully-retried values already saved to results_map. The pop(0) consumes them in order, picking up the wrong element for positions that should be exceptions.

Example (from the issue):

Inputs: ["ok", "retry_then_ok", "always_fail"]
After attempt 2, result = ["retry-result", ValueError]
results_map = {0: "ok-result", 1: "retry-result"}
For index 2 (not in map): result.pop(0) returns "retry-result" instead of the ValueError

Fix

Replace the pop(0)-based assembly with an index-mapped lookup:

Track last_remaining_indices across retry iterations
After the loop, build last_result_map = dict(zip(last_remaining_indices, result))
Look up each original index in either results_map (succeeded) or last_result_map (still-failing)

Applied identically to both _batch and _abatch.

Tests

Added sync and async regression tests that reproduce the exact scenario from the issue: one item succeeds immediately, one succeeds on retry, one always fails. Before this fix, result[2] was "retry-result" instead of an exception.

Fixes #35475

…ial retries succeed After retries, the final assembly used result.pop(0) to fill positions not in results_map. But result still contained successfully-retried values alongside exceptions, so the pop consumed the wrong elements — replacing exceptions with stale success values. Replace the pop-based assembly with an index-mapped lookup using last_remaining_indices so each original position maps to its correct result from the last retry batch. Fixes langchain-ai#35475

codspeed-hq · 2026-03-09T10:37:04Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 13 untouched benchmarks
⏩ 23 skipped benchmarks¹

_{Comparing giulio-leone:fix/runnable-retry-batch-corruption-35475-v2 (a66a9aa) with master (6f27c2b)}

23 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Giulio Leone (giulio-leone) · 2026-03-09T16:29:05Z

Friendly ping — CI is green, tests pass, rebased on latest. Ready for review whenever convenient. Happy to address any feedback. 🙏

Giulio Leone (giulio-leone) · 2026-03-09T17:24:15Z

✅ Verified with real OpenAI batch call

Environment: Python 3.13.12, macOS, langchain-core + langchain-openai from this branch

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4.1-mini", max_retries=2)
inputs = [
    [HumanMessage(content="Reply with ONLY the number 1")],
    [HumanMessage(content="Reply with ONLY the number 2")],
    [HumanMessage(content="Reply with ONLY the number 3")],
]
results = llm.batch(inputs)
for i, r in enumerate(results):
    print(f"Input {i+1} -> {r.content.strip()}")

Input 1 -> 1  ✅
Input 2 -> 2  ✅
Input 3 -> 3  ✅

Batch with max_retries=2 correctly preserves input↔output order.

Giulio Leone (giulio-leone) · 2026-03-09T17:27:35Z

ccurme (@ccurme) This fixes a batch output corruption bug in RunnableRetry where failed-then-retried items could shift output positions. Real API test shows correct 1:1 input→output ordering with max_retries=2. Tests included for both sync and async paths.

Ethan T. (gambletan)

Solid fix for a subtle index-mapping bug. The root cause analysis is clear: when result.pop(0) was used to distribute retry results, it didn't account for the fact that result only contains entries for the remaining (retried) indices, not all original indices. Using dict(zip(last_remaining_indices, result, strict=True)) correctly maps each retry result back to its original position.

A couple of notes:

strict=True in zip is a good safety net: If last_remaining_indices and result ever have mismatched lengths, this will raise ValueError immediately rather than silently producing wrong output. Nice defensive choice.
Edge case — all items succeed on first try: When all items succeed, remaining_indices becomes empty and we break before updating last_remaining_indices. In that case last_remaining_indices stays as list(range(len(inputs))) (the initial value), but result remains not_set (the sentinel). The code after the try/except only enters the else branch (last_result_map[idx]) when idx not in results_map, so this path would only be reached if there's an idx not already in results_map. Since all items succeeded, they'd all be in results_map, so the last_result_map is never consulted. This is correct but somewhat subtle — a brief comment noting this invariant might help future readers.
Test coverage is excellent: Both sync and async tests with the three-way scenario (immediate success, transient failure, permanent failure) are exactly the right regression tests. The failed_once flag pattern is clean.
Both _batch and _abatch are updated symmetrically — good, no risk of the async path being missed.

LGTM overall.

Clarify why last_result_map is only consulted for indices that still need fallback values after retries.

Giulio Leone (giulio-leone) · 2026-03-11T15:50:33Z

Thanks — I pushed a small follow-up commit (37f30ad) to clarify that invariant in both the sync and async paths.

I also reran targeted lint, the regression tests, and a direct runtime reproduction for the partial-retry scenario after the change.

Giulio Leone (giulio-leone) · 2026-03-12T22:08:21Z

Friendly ping — rebased on latest and ready for review. Happy to address any feedback!

Giulio Leone (giulio-leone) · 2026-03-13T01:23:35Z

Thanks for the thorough review Ethan T. (@gambletan)!

Great observation on point #2 — you're right that the invariant is subtle. I'll add a brief inline comment to make it explicit for future readers.

The edge case is indeed safe: when all items succeed on the first try, remaining_indices becomes empty → break → last_result_map is never consulted since every idx is already in results_map. But documenting this explicitly is worthwhile.

@gambletan

Add detailed comment explaining why last_result_map is safe when every item succeeds on the first attempt, as suggested by @gambletan in review.

Alvin Tang (alvinttang)

Review: Prevent output corruption in RunnableRetry.batch when partial retries succeed

Excellent bug analysis

The root cause is well-identified: result.pop(0) consumed elements sequentially from the last retry's output list, but that list only contains results for the retried indices — not all original indices. When results_map already consumed some of those results (successfully retried items), pop(0) shifts remaining elements to wrong positions.

Fix correctness

The fix replaces pop(0) with an index-mapped lookup via dict(zip(last_remaining_indices, result, strict=True)). This is correct:

last_remaining_indices tracks which original indices were attempted in the final retry round
result contains outcomes in the same order as last_remaining_indices
The strict=True in zip is a nice safety net — it will raise ValueError if the two lists have different lengths, catching any future logic errors

The same fix is applied symmetrically to both _batch and _abatch. Good.

Edge case analysis

All succeed on first attempt: remaining_indices becomes empty on the second iteration, hitting the break before last_remaining_indices is updated. So last_remaining_indices == range(len(inputs)) and result contains the full first-attempt output. All indices land in results_map, so last_result_map is never consulted. Correct.

All fail permanently: results_map stays empty. last_remaining_indices == range(len(inputs)) after the final attempt. result is either the last retry output or the [e] * len(inputs) fallback. In the fallback case, result has len(inputs) elements and last_remaining_indices also has len(inputs) elements, so strict=True zip works. Correct.

Single item: Trivially correct — one index, one result.

Python version concern

zip(..., strict=True) was introduced in Python 3.10. LangChain core's minimum Python version should be checked — if it supports 3.9, this would be a compatibility break. Looking at recent pyproject.toml changes, langchain-core requires >=3.9. If 3.9 is still supported, this needs itertools or a manual length check instead.

Edit: Actually, checking the codebase, langchain-core's pyproject.toml specifies python = ">=3.9". The strict=True parameter for zip is only available in Python 3.10+. This would cause a TypeError on Python 3.9. This needs to be fixed before merge — either drop strict=True or add a manual assertion:

assert len(last_remaining_indices) == len(result)
last_result_map = dict(zip(last_remaining_indices, result))

Tests

The regression tests are well-structured — they precisely reproduce the corruption scenario from the issue. Both sync and async variants are tested. The nonlocal failed_once pattern cleanly simulates a transient failure.

One suggestion: consider adding a test where multiple items succeed on retry at different retry rounds (e.g., item A succeeds on retry 1, item B succeeds on retry 2) to exercise the results_map accumulation across multiple iterations.

Summary

Excellent fix for a real data corruption bug. The index-mapping approach is clean and correct. The main blocker is the zip(strict=True) Python 3.9 compatibility issue — please verify the minimum supported Python version and adjust accordingly.

zip(strict=True) was introduced in Python 3.10, but langchain-core supports Python >=3.9. Remove the strict parameter — the invariant (equal-length lists) is guaranteed by the retry loop logic.

Giulio Leone (giulio-leone) · 2026-03-14T19:10:55Z

Alvin Tang (@alvinttang) Good catch on the zip(strict=True) Python 3.9 compatibility issue — fixed in c029ac2. Dropped strict=True since the invariant (equal-length lists) is guaranteed by the retry loop's own logic (last_remaining_indices is always set from remaining_indices which drives the retry batch() call producing result).

Giulio Leone (giulio-leone) requested a review from Eugene Yurtsev (eyurtsev) as a code owner March 9, 2026 10:33

github-actions bot added external core `langchain-core` package issues & PRs fix For PRs that implement a fix and removed external labels Mar 9, 2026

org-membership-reviewer bot added external size: S 50-199 LOC labels Mar 9, 2026

fix: add strict=True to zip() calls to satisfy ruff B905

231ad03

Giulio Leone (giulio-leone) force-pushed the fix/runnable-retry-batch-corruption-35475-v2 branch from 79665d4 to 231ad03 Compare March 9, 2026 15:21

Ethan T. (gambletan) reviewed Mar 10, 2026

View reviewed changes

docs(core): clarify retry result mapping invariant

2638430

Clarify why last_result_map is only consulted for indices that still need fallback values after retries.

docs: clarify all-succeed-first-try edge case invariant

5354bcc

Add detailed comment explaining why last_result_map is safe when every item succeeds on the first attempt, as suggested by @gambletan in review.

Alvin Tang (alvinttang) reviewed Mar 14, 2026

View reviewed changes

fix: drop zip(strict=True) for Python 3.9 compatibility

a66a9aa

zip(strict=True) was introduced in Python 3.10, but langchain-core supports Python >=3.9. Remove the strict parameter — the invariant (equal-length lists) is guaranteed by the retry loop logic.

Giulio Leone (giulio-leone) force-pushed the fix/runnable-retry-batch-corruption-35475-v2 branch from c029ac2 to a66a9aa Compare March 15, 2026 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): prevent output corruption in RunnableRetry.batch when partial retries succeed#35683

fix(core): prevent output corruption in RunnableRetry.batch when partial retries succeed#35683
Giulio Leone (giulio-leone) wants to merge 5 commits intolangchain-ai:masterfrom
giulio-leone:fix/runnable-retry-batch-corruption-35475-v2

Giulio Leone (giulio-leone) commented Mar 9, 2026

Uh oh!

codspeed-hq bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

Giulio Leone (giulio-leone) commented Mar 9, 2026

Uh oh!

Giulio Leone (giulio-leone) commented Mar 9, 2026

Uh oh!

Giulio Leone (giulio-leone) commented Mar 9, 2026

Uh oh!

Ethan T. (gambletan) left a comment

Uh oh!

Giulio Leone (giulio-leone) commented Mar 11, 2026

Uh oh!

Giulio Leone (giulio-leone) commented Mar 12, 2026

Uh oh!

Giulio Leone (giulio-leone) commented Mar 13, 2026

Uh oh!

Alvin Tang (alvinttang) left a comment

Uh oh!

Giulio Leone (giulio-leone) commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Giulio Leone (giulio-leone) commented Mar 9, 2026

Bug

Root Cause

Fix

Tests

Uh oh!

codspeed-hq bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

Giulio Leone (giulio-leone) commented Mar 9, 2026

Uh oh!

Giulio Leone (giulio-leone) commented Mar 9, 2026

✅ Verified with real OpenAI batch call

Uh oh!

Giulio Leone (giulio-leone) commented Mar 9, 2026

Uh oh!

Ethan T. (gambletan) left a comment

Choose a reason for hiding this comment

Uh oh!

Giulio Leone (giulio-leone) commented Mar 11, 2026

Uh oh!

Giulio Leone (giulio-leone) commented Mar 12, 2026

Uh oh!

Giulio Leone (giulio-leone) commented Mar 13, 2026

Uh oh!

Alvin Tang (alvinttang) left a comment

Choose a reason for hiding this comment

Review: Prevent output corruption in RunnableRetry.batch when partial retries succeed

Excellent bug analysis

Fix correctness

Edge case analysis

Python version concern

Tests

Summary

Uh oh!

Giulio Leone (giulio-leone) commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq bot commented Mar 9, 2026 •

edited

Loading