Skip to content

LLMProxy doesn't support stream #262

@ultmaster

Description

@ultmaster

We can't get token ids from LLM Proxy when stream is enabled.

vLLM has token_ids returned in streaming. The problem is with LiteLLM and has three parts.

  1. .venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.py has different implementations for __next__ and __anext__. I think __anext__ forgets to call success logging callback for each chunk. It only calls once for whole response.
  2. Still in this file, the chunk_creator directly tosses away token_ids from the raw chunk, causing the important token_ids being missing.
  3. There is no handler of stream_event in .venv/lib/python3.12/site-packages/litellm/integrations/opentelemetry.py. Thus we don't receive anything in the store.

A systematic bug fix for this issue is complex. A simple solution might be to turn off stream via some guardrail middleware, and fake a stream chunk when the non-streaming response is ready.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions