Fallbacks don't trigger when a critical error is reported in streaming Responses endpoint #15910
Replies: 2 comments
-
|
Yeah this is a real gap with gateway-level fallback. Once the stream starts, most proxies treat it as a success because they already got a 200 back. The error showing up in the first chunk instead of as an HTTP status means the fallback logic never fires. One approach: add a client-side wrapper that inspects the first N chunks and triggers a retry to a different model if it detects an error payload. Not ideal but it works as a stopgap. Longer term I've been building Kalibr for exactly this kind of thing. It sits at the SDK level (not proxy level) so it can detect failures based on actual outcomes, not just HTTP codes. If a model starts returning errors mid-stream or degraded responses, it reroutes future calls automatically. Might be worth a look if you're hitting this pattern often. |
Beta Was this translation helpful? Give feedback.
-
|
Streaming fallback issues are particularly nasty! At RevolutionAI (https://revolutionai.io) we built custom fallback logic for this exact scenario. The problem: Streaming responses fail mid-stream, and by then the fallback window has passed. Our solution:
async def resilient_stream(models, messages):
for model in models:
try:
async for chunk in stream_with_timeout(model, messages):
yield chunk
return # success
except StreamingError:
continue # try next modelThe key insight: treat streaming as a series of health checks, not a single operation. Happy to elaborate on any of these patterns! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
We're using LiteLLM as our AI Gateway with model groups configured with fallback options. We've noticed an issue with error handling when streaming responses from OpenAI models.
Observed behavior:
When critical errors occur (such as context window exceeded, rate limit errors, or PTU-related issues), they are returned in the first chunk after the stream has started, rather than as an HTTP error before streaming begins.
Our questions:
Any guidance would be appreciated!
Beta Was this translation helpful? Give feedback.
All reactions