fix(anthropic/openai/google): wrap io.ErrUnexpectedEOF as ProviderError#198
Merged
andreynering merged 4 commits intoApr 22, 2026
Merged
Conversation
…engage When the Anthropic SSE stream drops mid-response with a raw I/O error like io.ErrUnexpectedEOF, toProviderErr previously returned the error unchanged because errors.As(err, &apiErr) fails for non-*anthropic.Error values. The retry loop in retry.go gates on errors.As(err, &ProviderError), so the retry branch is never entered and the bare error is returned to the caller on the first attempt. This is especially ironic because ProviderError.IsRetryable already special-cases io.ErrUnexpectedEOF as retryable — that check is simply never reached for raw stream transport errors. Wrap io.ErrUnexpectedEOF (including wrapped forms via errors.Is) into a ProviderError with Cause set, so the existing IsRetryable path engages and transient mid-stream disconnects are retried as intended. Observed on long-running tool-heavy sessions (~8 minutes of continuous streaming) where idle timeouts / proxy resets would otherwise abort the whole turn with a bare "unexpected EOF".
…ies engage Same bug as anthropic/error.go: when the SSE stream drops mid-response with a raw io.ErrUnexpectedEOF, toProviderErr returned the error unchanged because errors.As(err, &apiErr) fails for non-API-error values. The retry loop in retry.go gates on *ProviderError, so the retry branch is never entered and the bare error is returned to the caller on the first attempt. ProviderError.IsRetryable already special-cases io.ErrUnexpectedEOF as retryable — wrapping it here lets that existing check engage. Transitively fixes azure, openaicompat, openrouter, and vercel (which wrap the openai provider). bedrock was already covered by the anthropic fix. kronk is unaffected — it already wraps all errors as ProviderError.
Contributor
Author
|
Found the issue to be in |
andreynering
approved these changes
Apr 22, 2026
andreynering
left a comment
Member
There was a problem hiding this comment.
Good patch. Thank you @ljuti!
io.ErrUnexpectedEOF as ProviderError
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a provider's SSE stream drops mid-response with a raw I/O error like
io.ErrUnexpectedEOF(idle timeouts, proxy resets, mid-stream TCP disconnects),toProviderErrin each of the three direct providers returns the error unchanged becauseerrors.As(err, &apiErr)fails for non-API-error values. The retry loop gates onerrors.As(err, &ProviderError)+IsRetryable(), so the retry branch is never entered and the bare error propagates to the caller on the first attempt.This is especially ironic because
ProviderError.IsRetryableaterrors.goalready special-casesio.ErrUnexpectedEOFas retryable — that check is simply never reached for raw stream transport errors in any direct provider except kronk.This PR wraps
io.ErrUnexpectedEOF(including wrapped forms via errors.Is) into aProviderErrorwithCauseset in all three direct providers, so the existingIsRetryablepath engages and transient mid-stream disconnects are retried as intended.Provider coverage
anthropic: fixedopenai: fixedgoogle: fixedkronk: no changebedrock: inheritsanthropicfixazure: inheritsopenaifixopenaicompat: inheritsopenaifixopenrouter: inheritsopenaifixvercel: inheritsopenaifixFailure trace (Anthropic example)
providers/anthropic/anthropic.go—stream.Err()returnsio.ErrUnexpectedEOFfrom the SSE decoder.toProviderErr(err)is called.providers/anthropic/error.go(before this PR) —errors.As(err, &apiErr)fails; raw error returned unchanged.retry.go:112-113—errors.As(err, &providerErr)fails; retry branch skipped.retry.go:132-133— first-attempt non-retryable errors return without wrapping inRetryError. Caller gets bare unexpected EOF.Net effect:
IsRetryablenever runs, and a failure mode the code already knows how to handle is treated as fatal.Repro
Reliably reproducible on long-running tool-heavy sessions (~8 minutes of continuous streaming) where idle timeouts / proxy resets sever the SSE connection mid-response. The top-level error bubbles up as a bare unexpected EOF after thousands of successful stream parts.
Change
Minimal targeted wrap in
providers/anthropic/error.go:Only the
io.ErrUnexpectedEOFcase is wrapped, to stay conservative. Other transient network errors (*net.OpError,syscall.ECONNRESET,http2.StreamError) are good candidates for a follow-up but aren't included here.Tests
New
providers/anthropic/error_test.gocovers:TestToProviderErr_WrapsUnexpectedEOF— direct, wrapped once, wrapped twice; verifies the result is*fantasy.ProviderError,Causechains back toio.ErrUnexpectedEOF, andIsRetryable()returns true.TestToProviderErr_PassesThroughUnrelatedErrors— arbitrary non-EOF errors are returned unchanged.TestToProviderErr_PassesThroughPlainEOF— cleanio.EOFis not wrapped (the stream handler treatsio.EOFas a clean terminator; wrapping would invite false retries).I have read
CONTRIBUTING.md.