Fallbacks don't trigger when a critical error is reported in streaming Responses endpoint #15910

arunmittal1 · 2025-10-24T17:12:33Z

arunmittal1
Oct 24, 2025

Hello,

We're using LiteLLM as our AI Gateway with model groups configured with fallback options. We've noticed an issue with error handling when streaming responses from OpenAI models.

Observed behavior:
When critical errors occur (such as context window exceeded, rate limit errors, or PTU-related issues), they are returned in the first chunk after the stream has started, rather than as an HTTP error before streaming begins.

Our questions:

Can LiteLLM detect these errors and trigger a fallback to the next model in the group, even when these critical errors occur as the first chunk after streaming has started?
Is there a recommended workaround/best practice to handle this scenario, preferably without requiring clients to implement their own recovery logic?

Any guidance would be appreciated!

devonakelley · 2026-02-22T06:28:38Z

devonakelley
Feb 22, 2026

Yeah this is a real gap with gateway-level fallback. Once the stream starts, most proxies treat it as a success because they already got a 200 back. The error showing up in the first chunk instead of as an HTTP status means the fallback logic never fires.

One approach: add a client-side wrapper that inspects the first N chunks and triggers a retry to a different model if it detects an error payload. Not ideal but it works as a stopgap.

Longer term I've been building Kalibr for exactly this kind of thing. It sits at the SDK level (not proxy level) so it can detect failures based on actual outcomes, not just HTTP codes. If a model starts returning errors mid-stream or degraded responses, it reroutes future calls automatically. Might be worth a look if you're hitting this pattern often.

0 replies

xXMrNidaXx · 2026-02-23T13:12:47Z

xXMrNidaXx
Feb 23, 2026

Streaming fallback issues are particularly nasty! At RevolutionAI (https://revolutionai.io) we built custom fallback logic for this exact scenario.

The problem: Streaming responses fail mid-stream, and by then the fallback window has passed.

Our solution:

Buffer initial chunks - do not start returning until you have confirmed stream stability (first 3-5 chunks)
Timeout per chunk - if no chunk arrives in X seconds, trigger fallback
Partial response recovery - save what you have, continue from fallback model
Circuit breaker pattern - track streaming failures per model, preemptively route away from flaky endpoints

async def resilient_stream(models, messages):
    for model in models:
        try:
            async for chunk in stream_with_timeout(model, messages):
                yield chunk
            return  # success
        except StreamingError:
            continue  # try next model

The key insight: treat streaming as a series of health checks, not a single operation. Happy to elaborate on any of these patterns!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fallbacks don't trigger when a critical error is reported in streaming Responses endpoint #15910

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Fallbacks don't trigger when a critical error is reported in streaming Responses endpoint #15910

Uh oh!

Uh oh!

arunmittal1 Oct 24, 2025

Replies: 2 comments

Uh oh!

devonakelley Feb 22, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

arunmittal1
Oct 24, 2025

devonakelley
Feb 22, 2026

xXMrNidaXx
Feb 23, 2026