fix(core,openai): fix v1 streaming tool calls dropped for chat completions#35983
fix(core,openai): fix v1 streaming tool calls dropped for chat completions#35983Mason Daugherty (mdrxy) wants to merge 3 commits intomasterfrom
Conversation
Merging this PR will improve performance by 36.7%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | WallTime | test_import_time[PydanticOutputParser] |
571.8 ms | 485.7 ms | +17.73% |
| ⚡ | WallTime | test_import_time[InMemoryRateLimiter] |
182.8 ms | 157.5 ms | +16.03% |
| ⚡ | WallTime | test_import_time[LangChainTracer] |
475.3 ms | 416.1 ms | +14.23% |
| ⚡ | WallTime | test_import_time[tool] |
569.9 ms | 480.9 ms | +18.51% |
| ⚡ | WallTime | test_import_time[BaseChatModel] |
556 ms | 478.6 ms | +16.18% |
| ⚡ | WallTime | test_import_time[Document] |
196.2 ms | 166.1 ms | +18.11% |
| ⚡ | WallTime | test_async_callbacks_in_sync |
25.8 ms | 18.9 ms | +36.7% |
| ⚡ | WallTime | test_import_time[HumanMessage] |
274.5 ms | 237.6 ms | +15.56% |
| ⚡ | WallTime | test_import_time[RunnableLambda] |
526.3 ms | 439.8 ms | +19.67% |
| ⚡ | WallTime | test_import_time[ChatPromptTemplate] |
668.1 ms | 562.1 ms | +18.86% |
| ⚡ | WallTime | test_import_time[Runnable] |
521.5 ms | 444.6 ms | +17.29% |
| ⚡ | WallTime | test_import_time[CallbackManager] |
329.8 ms | 286.3 ms | +15.19% |
| ⚡ | WallTime | test_import_time[InMemoryVectorStore] |
639.9 ms | 535.6 ms | +19.47% |
Comparing mdrxy/fix-completions (5ce9944) with master (69a7b9c)2
Footnotes
-
17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
master(2bad58a) during the generation of this report, so 69a7b9c was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
Streaming with
output_version="v1"and bound tools silently dropped tool calls fromcontent_blockson the mergedAIMessageChunk.Two root causes:
BaseChatOpenAI._convert_chunk_to_generation_chunksetcontent=[]on the usage-only chunk while content chunks usedcontent="", corrupting the merged content via string/list type mismatchcontent_blocksshort-circuited foroutput_version="v1"even when content was a raw string rather than a list ofContentBlockdicts, returning the string directly instead of routing through the model_provider translator.Changes
content = []override on usage-only (empty-choices) chunks inBaseChatOpenAI._convert_chunk_to_generation_chunk, keeping content as""so all Chat Completions streaming chunks have consistent string content through mergeoutput_version="v1"to content chunks (non-empty choices) in_convert_chunk_to_generation_chunk— previously only the usage-only chunk carried itAIMessage.content_blocksandAIMessageChunk.content_blockswithisinstance(self.content, list)so string content falls through to the model_provider translator, which correctly buildsContentBlockdicts fromtool_calls/tool_call_chunksRevert basetenlabs/langchain-baseten#6 when released