Skip to content

Fix duplicate API calls causing hang with Anthropic streaming#404

Open
0xhsn wants to merge 1 commit intoaliasrobotics:mainfrom
0xhsn:fix/anthropic-api-hang
Open

Fix duplicate API calls causing hang with Anthropic streaming#404
0xhsn wants to merge 1 commit intoaliasrobotics:mainfrom
0xhsn:fix/anthropic-api-hang

Conversation

@0xhsn
Copy link

@0xhsn 0xhsn commented Jan 27, 2026

fixes #401

Changelog

  • Remove duplicate litellm.acompletion() call in streaming path (line 3308)
  • Remove duplicate litellm.acompletion() call in retry path (line 3362)
  • Each streaming request now makes only one API call instead of two

Context

When using Anthropic API keys with CAI, the application would hang or appear very slow. Investigation revealed that the streaming code was making two identical API calls for every request, then discarding the result of the first one.

The bug was in _fetch_response_litellm_openai() where:

ret = await litellm.acompletion(**kwargs)      # First call - DISCARDED
stream_obj = await litellm.acompletion(**kwargs)  # Second call - used

This caused:

  • 2x latency on every streaming request
  • 2x token usage
  • Apparent "hanging" behavior while waiting for the first (wasted) call

Testing and Deployment

  • Tested with Anthropic API key - requests now return promptly instead of hanging
  • No new environment variables or deployment changes required
  • The fix is a simple removal of duplicate lines

Remove duplicate litellm.acompletion() calls that were causing every
streaming request to make two identical API calls, with the first
result being discarded. This doubled latency and caused apparent
hanging behavior, especially noticeable with Anthropic API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CAI hangs when using Anthropic provided Claude model

1 participant