LLMProxy doesn't support stream

We can't get token ids from LLM Proxy when stream is enabled.

vLLM has token_ids returned in streaming. The problem is with LiteLLM and has three parts.

1. `.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.py` has different implementations for `__next__` and `__anext__`. I think `__anext__` forgets to call success logging callback for each chunk. It only calls once for whole response.
2. Still in this file, the `chunk_creator` directly tosses away `token_ids` from the raw chunk, causing the important token_ids being missing.
3. There is no handler of `stream_event` in `.venv/lib/python3.12/site-packages/litellm/integrations/opentelemetry.py`. Thus we don't receive anything in the store.

A systematic bug fix for this issue is complex. A simple solution might be to turn off stream via some guardrail middleware, and fake a stream chunk when the non-streaming response is ready.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLMProxy doesn't support stream #262

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LLMProxy doesn't support stream #262

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions