-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Currenly, all choice-less chunks (such as chunks which report usage and statistics fields) aren't immediately sent into a response stream, but their sending is delayed until a chunk with a choice is encountered.
These choice-less chunks are then merged together and sent along with this choice-having carrier chunk.
Since there is a chance that choice-less chunks will be received from the upstream as the very last chunks, the carrier chunk is chosen as the last choice closing chunk:
ai-dial-sdk/aidial_sdk/chat_completion/response.py
Lines 108 to 127 in 8abe579
| if isinstance(chunk, BaseChunk): | |
| is_last_end_choice_chunk = ( | |
| isinstance(chunk, EndChoiceChunk) | |
| and chunk.choice_index == self.n - 1 | |
| ) | |
| is_top_level_chunk = isinstance( | |
| chunk, | |
| ( | |
| UsageChunk, | |
| UsagePerModelChunk, | |
| DiscardedMessagesChunk, | |
| ), | |
| ) | |
| if is_last_end_choice_chunk or is_top_level_chunk: | |
| delayed_chunks.append(chunk) | |
| else: | |
| yield _create_chunk(chunk) |
Thus, such choice-less chunks are reported at the very end, which precludes their streaming.
It's desirable to enable streaming of the statistics fields: #15
Imagine an application which calls model A, reports per-model usage for A, then fails to call model B.
Currently, the downstream won't see any per-model usage since they are reporting in the very last chunk.
A similar delaying technique is used in adapter-openai to eliminate choice-less chunks.
Q: Why do we delay sending choice-less chunks in the first place?
A: Because we didn't know how to introduce the missing choices field the best. The possible solutions are:
- Add a fake message with an empty string content:
{
"choices": [{"index": 0, "delta": {"content": ""}}],
"usage": {
"prompt_tokens": 1,
"completion_tokens": 2,
"total_tokens": 3
}
}- Add an empty list of choices:
{
"choices": [],
"usage": {
"prompt_tokens": 1,
"completion_tokens": 2,
"total_tokens": 3
}
}In the case of an empty list of choices, it wasn't clear if it could be parsed correctly downstream.
However, since then OpenAI has introduced stream_option.include_usage feature. It enables the generation of a chunk with an empty list of choices and a non-empty usage field.
This is correctly handled by popular OpenAI clients (openai and langchain libraries).
The proposal is to follow this empty list of choices convention.
TODO: ascertain that openai and langchain will be able to parse chunks with an empty list of choice and statistics field.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status