-
Notifications
You must be signed in to change notification settings - Fork 707
Open
Description
We can't get token ids from LLM Proxy when stream is enabled.
vLLM has token_ids returned in streaming. The problem is with LiteLLM and has three parts.
.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.pyhas different implementations for__next__and__anext__. I think__anext__forgets to call success logging callback for each chunk. It only calls once for whole response.- Still in this file, the
chunk_creatordirectly tosses awaytoken_idsfrom the raw chunk, causing the important token_ids being missing. - There is no handler of
stream_eventin.venv/lib/python3.12/site-packages/litellm/integrations/opentelemetry.py. Thus we don't receive anything in the store.
A systematic bug fix for this issue is complex. A simple solution might be to turn off stream via some guardrail middleware, and fake a stream chunk when the non-streaming response is ready.
Metadata
Metadata
Assignees
Labels
No labels