Skip to content

Support streaming of choice-less chunks #199

@adubovik

Description

@adubovik

Currenly, all choice-less chunks (such as chunks which report usage and statistics fields) aren't immediately sent into a response stream, but their sending is delayed until a chunk with a choice is encountered.
These choice-less chunks are then merged together and sent along with this choice-having carrier chunk.

Since there is a chance that choice-less chunks will be received from the upstream as the very last chunks, the carrier chunk is chosen as the last choice closing chunk:

if isinstance(chunk, BaseChunk):
is_last_end_choice_chunk = (
isinstance(chunk, EndChoiceChunk)
and chunk.choice_index == self.n - 1
)
is_top_level_chunk = isinstance(
chunk,
(
UsageChunk,
UsagePerModelChunk,
DiscardedMessagesChunk,
),
)
if is_last_end_choice_chunk or is_top_level_chunk:
delayed_chunks.append(chunk)
else:
yield _create_chunk(chunk)

Thus, such choice-less chunks are reported at the very end, which precludes their streaming.

It's desirable to enable streaming of the statistics fields: #15
Imagine an application which calls model A, reports per-model usage for A, then fails to call model B.
Currently, the downstream won't see any per-model usage since they are reporting in the very last chunk.

A similar delaying technique is used in adapter-openai to eliminate choice-less chunks.

Q: Why do we delay sending choice-less chunks in the first place?
A: Because we didn't know how to introduce the missing choices field the best. The possible solutions are:

  1. Add a fake message with an empty string content:
{
  "choices": [{"index": 0, "delta": {"content": ""}}],
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 2,
    "total_tokens": 3
  }
}
  1. Add an empty list of choices:
{
  "choices": [],
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 2,
    "total_tokens": 3
  }
}

In the case of an empty list of choices, it wasn't clear if it could be parsed correctly downstream.
However, since then OpenAI has introduced stream_option.include_usage feature. It enables the generation of a chunk with an empty list of choices and a non-empty usage field.

Screenshot 2024-11-29 at 16 22 32

This is correctly handled by popular OpenAI clients (openai and langchain libraries).

The proposal is to follow this empty list of choices convention.

TODO: ascertain that openai and langchain will be able to parse chunks with an empty list of choice and statistics field.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions