Skip to content

fix(core): prevent output corruption in RunnableRetry.batch when partial retries succeed#35683

Open
Giulio Leone (giulio-leone) wants to merge 5 commits intolangchain-ai:masterfrom
giulio-leone:fix/runnable-retry-batch-corruption-35475-v2
Open

fix(core): prevent output corruption in RunnableRetry.batch when partial retries succeed#35683
Giulio Leone (giulio-leone) wants to merge 5 commits intolangchain-ai:masterfrom
giulio-leone:fix/runnable-retry-batch-corruption-35475-v2

Conversation

@giulio-leone
Copy link
Contributor

Bug

RunnableRetry.batch() / abatch() with return_exceptions=True can return corrupted outputs when some items succeed on retry while others still fail. A permanently-failing item can be silently replaced by a successfully-retried value from a different position.

Root Cause

After retries exhaust, the final assembly loop uses result.pop(0) to fill positions not yet in results_map. But result still contains all items from the last retry batch — including successfully-retried values already saved to results_map. The pop(0) consumes them in order, picking up the wrong element for positions that should be exceptions.

Example (from the issue):

  • Inputs: ["ok", "retry_then_ok", "always_fail"]
  • After attempt 2, result = ["retry-result", ValueError]
  • results_map = {0: "ok-result", 1: "retry-result"}
  • For index 2 (not in map): result.pop(0) returns "retry-result" instead of the ValueError

Fix

Replace the pop(0)-based assembly with an index-mapped lookup:

  • Track last_remaining_indices across retry iterations
  • After the loop, build last_result_map = dict(zip(last_remaining_indices, result))
  • Look up each original index in either results_map (succeeded) or last_result_map (still-failing)

Applied identically to both _batch and _abatch.

Tests

Added sync and async regression tests that reproduce the exact scenario from the issue: one item succeeds immediately, one succeeds on retry, one always fails. Before this fix, result[2] was "retry-result" instead of an exception.

Fixes #35475

…ial retries succeed

After retries, the final assembly used result.pop(0) to fill positions not
in results_map. But result still contained successfully-retried values
alongside exceptions, so the pop consumed the wrong elements — replacing
exceptions with stale success values.

Replace the pop-based assembly with an index-mapped lookup using
last_remaining_indices so each original position maps to its correct
result from the last retry batch.

Fixes langchain-ai#35475
@github-actions github-actions bot added external core `langchain-core` package issues & PRs fix For PRs that implement a fix and removed external labels Mar 9, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 9, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 13 untouched benchmarks
⏩ 23 skipped benchmarks1


Comparing giulio-leone:fix/runnable-retry-batch-corruption-35475-v2 (a66a9aa) with master (6f27c2b)

Open in CodSpeed

Footnotes

  1. 23 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@giulio-leone Giulio Leone (giulio-leone) force-pushed the fix/runnable-retry-batch-corruption-35475-v2 branch from 79665d4 to 231ad03 Compare March 9, 2026 15:21
@giulio-leone
Copy link
Contributor Author

Friendly ping — CI is green, tests pass, rebased on latest. Ready for review whenever convenient. Happy to address any feedback. 🙏

@giulio-leone
Copy link
Contributor Author

✅ Verified with real OpenAI batch call

Environment: Python 3.13.12, macOS, langchain-core + langchain-openai from this branch

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4.1-mini", max_retries=2)
inputs = [
    [HumanMessage(content="Reply with ONLY the number 1")],
    [HumanMessage(content="Reply with ONLY the number 2")],
    [HumanMessage(content="Reply with ONLY the number 3")],
]
results = llm.batch(inputs)
for i, r in enumerate(results):
    print(f"Input {i+1} -> {r.content.strip()}")
Input 1 -> 1  ✅
Input 2 -> 2  ✅
Input 3 -> 3  ✅

Batch with max_retries=2 correctly preserves input↔output order.

@giulio-leone
Copy link
Contributor Author

ccurme (@ccurme) This fixes a batch output corruption bug in RunnableRetry where failed-then-retried items could shift output positions. Real API test shows correct 1:1 input→output ordering with max_retries=2. Tests included for both sync and async paths.

Copy link
Contributor

@gambletan Ethan T. (gambletan) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix for a subtle index-mapping bug. The root cause analysis is clear: when result.pop(0) was used to distribute retry results, it didn't account for the fact that result only contains entries for the remaining (retried) indices, not all original indices. Using dict(zip(last_remaining_indices, result, strict=True)) correctly maps each retry result back to its original position.

A couple of notes:

  1. strict=True in zip is a good safety net: If last_remaining_indices and result ever have mismatched lengths, this will raise ValueError immediately rather than silently producing wrong output. Nice defensive choice.

  2. Edge case — all items succeed on first try: When all items succeed, remaining_indices becomes empty and we break before updating last_remaining_indices. In that case last_remaining_indices stays as list(range(len(inputs))) (the initial value), but result remains not_set (the sentinel). The code after the try/except only enters the else branch (last_result_map[idx]) when idx not in results_map, so this path would only be reached if there's an idx not already in results_map. Since all items succeeded, they'd all be in results_map, so the last_result_map is never consulted. This is correct but somewhat subtle — a brief comment noting this invariant might help future readers.

  3. Test coverage is excellent: Both sync and async tests with the three-way scenario (immediate success, transient failure, permanent failure) are exactly the right regression tests. The failed_once flag pattern is clean.

  4. Both _batch and _abatch are updated symmetrically — good, no risk of the async path being missed.

LGTM overall.

Clarify why last_result_map is only consulted for indices that still need fallback values after retries.
@giulio-leone
Copy link
Contributor Author

Thanks — I pushed a small follow-up commit (37f30ad) to clarify that invariant in both the sync and async paths.

I also reran targeted lint, the regression tests, and a direct runtime reproduction for the partial-retry scenario after the change.

@giulio-leone
Copy link
Contributor Author

Friendly ping — rebased on latest and ready for review. Happy to address any feedback!

@giulio-leone
Copy link
Contributor Author

Thanks for the thorough review Ethan T. (@gambletan)!

Great observation on point #2 — you're right that the invariant is subtle. I'll add a brief inline comment to make it explicit for future readers.

The edge case is indeed safe: when all items succeed on the first try, remaining_indices becomes empty → breaklast_result_map is never consulted since every idx is already in results_map. But documenting this explicitly is worthwhile.

Add detailed comment explaining why last_result_map is safe when every
item succeeds on the first attempt, as suggested by @gambletan in review.
Copy link

@alvinttang Alvin Tang (alvinttang) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Prevent output corruption in RunnableRetry.batch when partial retries succeed

Excellent bug analysis

The root cause is well-identified: result.pop(0) consumed elements sequentially from the last retry's output list, but that list only contains results for the retried indices — not all original indices. When results_map already consumed some of those results (successfully retried items), pop(0) shifts remaining elements to wrong positions.

Fix correctness

The fix replaces pop(0) with an index-mapped lookup via dict(zip(last_remaining_indices, result, strict=True)). This is correct:

  • last_remaining_indices tracks which original indices were attempted in the final retry round
  • result contains outcomes in the same order as last_remaining_indices
  • The strict=True in zip is a nice safety net — it will raise ValueError if the two lists have different lengths, catching any future logic errors

The same fix is applied symmetrically to both _batch and _abatch. Good.

Edge case analysis

All succeed on first attempt: remaining_indices becomes empty on the second iteration, hitting the break before last_remaining_indices is updated. So last_remaining_indices == range(len(inputs)) and result contains the full first-attempt output. All indices land in results_map, so last_result_map is never consulted. Correct.

All fail permanently: results_map stays empty. last_remaining_indices == range(len(inputs)) after the final attempt. result is either the last retry output or the [e] * len(inputs) fallback. In the fallback case, result has len(inputs) elements and last_remaining_indices also has len(inputs) elements, so strict=True zip works. Correct.

Single item: Trivially correct — one index, one result.

Python version concern

zip(..., strict=True) was introduced in Python 3.10. LangChain core's minimum Python version should be checked — if it supports 3.9, this would be a compatibility break. Looking at recent pyproject.toml changes, langchain-core requires >=3.9. If 3.9 is still supported, this needs itertools or a manual length check instead.

Edit: Actually, checking the codebase, langchain-core's pyproject.toml specifies python = ">=3.9". The strict=True parameter for zip is only available in Python 3.10+. This would cause a TypeError on Python 3.9. This needs to be fixed before merge — either drop strict=True or add a manual assertion:

assert len(last_remaining_indices) == len(result)
last_result_map = dict(zip(last_remaining_indices, result))

Tests

The regression tests are well-structured — they precisely reproduce the corruption scenario from the issue. Both sync and async variants are tested. The nonlocal failed_once pattern cleanly simulates a transient failure.

One suggestion: consider adding a test where multiple items succeed on retry at different retry rounds (e.g., item A succeeds on retry 1, item B succeeds on retry 2) to exercise the results_map accumulation across multiple iterations.

Summary

Excellent fix for a real data corruption bug. The index-mapping approach is clean and correct. The main blocker is the zip(strict=True) Python 3.9 compatibility issue — please verify the minimum supported Python version and adjust accordingly.

zip(strict=True) was introduced in Python 3.10, but langchain-core
supports Python >=3.9. Remove the strict parameter — the invariant
(equal-length lists) is guaranteed by the retry loop logic.
@giulio-leone
Copy link
Contributor Author

Alvin Tang (@alvinttang) Good catch on the zip(strict=True) Python 3.9 compatibility issue — fixed in c029ac2. Dropped strict=True since the invariant (equal-length lists) is guaranteed by the retry loop's own logic (last_remaining_indices is always set from remaining_indices which drives the retry batch() call producing result).

@giulio-leone Giulio Leone (giulio-leone) force-pushed the fix/runnable-retry-batch-corruption-35475-v2 branch from c029ac2 to a66a9aa Compare March 15, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core `langchain-core` package issues & PRs external fix For PRs that implement a fix size: S 50-199 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RunnableRetry.batch/abatch can return corrupted outputs when some items succeed on retry and others still fail

3 participants