Fix duplicate API calls causing hang with Anthropic streaming#404
Open
0xhsn wants to merge 1 commit intoaliasrobotics:mainfrom
Open
Fix duplicate API calls causing hang with Anthropic streaming#4040xhsn wants to merge 1 commit intoaliasrobotics:mainfrom
0xhsn wants to merge 1 commit intoaliasrobotics:mainfrom
Conversation
Remove duplicate litellm.acompletion() calls that were causing every streaming request to make two identical API calls, with the first result being discarded. This doubled latency and caused apparent hanging behavior, especially noticeable with Anthropic API.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixes #401
Changelog
litellm.acompletion()call in streaming path (line 3308)litellm.acompletion()call in retry path (line 3362)Context
When using Anthropic API keys with CAI, the application would hang or appear very slow. Investigation revealed that the streaming code was making two identical API calls for every request, then discarding the result of the first one.
The bug was in
_fetch_response_litellm_openai()where:This caused:
Testing and Deployment