fix: ensure role='assistant' in Azure streaming with include_usage#24326
fix: ensure role='assistant' in Azure streaming with include_usage#24326majiayu000 wants to merge 4 commits intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
62a95d5 to
a8582fd
Compare
When Azure sends an initial chunk with no choices (prompt_filter_results) and stream_options.include_usage=True, the chunk was returned without calling strip_role_from_delta(). This caused sent_first_chunk to be set True prematurely, so the actual first content chunk had its role stripped. Call strip_role_from_delta() on the empty-choices return path, consistent with the other return paths in chunk_creator(). Fixes BerriAI#24221 Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
Reformat with Black 23.x to match CI's version requirement. The previous formatting commit used a newer Black version. Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
a8582fd to
1162925
Compare
Greptile SummaryThis PR fixes a missing Key changes:
Correctness notes: Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/streaming_handler.py | One-line fix calling strip_role_from_delta in the include_usage empty-choices path, consistent with other return paths; correctly resolves the Azure prompt_filter_results chunk causing role='assistant' to be skipped. |
| tests/test_litellm/litellm_core_utils/test_streaming_handler.py | Adds test_azure_streaming_role_with_include_usage covering both sync and async paths using pre-canned mock chunks; other changes are purely cosmetic reformatting; test assertion is slightly weaker than ideal (checks any chunk has role, not first chunk). |
| litellm/proxy/management_helpers/audit_logs.py | Purely cosmetic reformatting for line length (Black-style); no logic changes. |
Sequence Diagram
sequenceDiagram
participant Azure as Azure OpenAI
participant CSW as CustomStreamWrapper
participant Client as API Consumer
Note over Azure,Client: stream_options.include_usage=True
Azure->>CSW: Chunk 1: prompt_filter_results (choices=[])
Note over CSW: chunk_creator: else branch (no choices)<br/>include_usage=True
Note over CSW: ✅ FIXED: calls strip_role_from_delta()<br/>sent_first_chunk=False → sets role='assistant', sent_first_chunk=True
CSW->>Client: {choices:[{delta:{role:'assistant'}}]}
Azure->>CSW: Chunk 2: first content (role='assistant', content='')
Note over CSW: strip_role_from_delta()<br/>sent_first_chunk=True → strips role
CSW->>Client: {choices:[{delta:{content:''}}]}
Azure->>CSW: Chunk 3: content ('Hello')
CSW->>Client: {choices:[{delta:{content:'Hello'}}]}
Azure->>CSW: Chunk 4: finish_reason='stop'
CSW->>Client: {choices:[{finish_reason:'stop', delta:{}}]}
Azure->>CSW: Chunk 5: usage chunk (choices=[])
Note over CSW: include_usage=True, strip_role_from_delta()<br/>sent_first_chunk=True → safe no-op pop
CSW->>Client: {choices:[…], usage:{…}}
Last reviewed commit: "style: fix Black for..."
| # At least one chunk must have role='assistant' in its delta | ||
| has_role = any( | ||
| hasattr(c, "choices") | ||
| and len(c.choices) > 0 | ||
| and hasattr(c.choices[0], "delta") | ||
| and getattr(c.choices[0].delta, "role", None) == "assistant" | ||
| for c in chunks | ||
| ) | ||
| assert ( | ||
| has_role | ||
| ), "No chunk contained role='assistant' in delta. " "Chunk deltas: " + str( | ||
| [c.choices[0].delta if c.choices else "no choices" for c in chunks] | ||
| ) |
There was a problem hiding this comment.
Consider a stronger assertion for
role placement
The current assertion only verifies that at least one chunk in the entire stream carries role='assistant'. After the fix, the role is attached to the empty prompt_filter_results chunk (the first chunk emitted), while the actual first content chunk has its role stripped by strip_role_from_delta. A stricter test would also confirm that the role appears on the correct chunk (the first yielded chunk) and is absent from later content chunks, preventing a future regression where both chunks could accidentally carry the role:
# Verify role appears in exactly the first emitted chunk
assert len(chunks) > 0, "No chunks were yielded"
first_chunk = chunks[0]
assert (
len(first_chunk.choices) > 0
and getattr(first_chunk.choices[0].delta, "role", None) == "assistant"
), f"Expected role='assistant' in the first chunk, got: {first_chunk.choices[0].delta if first_chunk.choices else 'no choices'}"
# Verify subsequent chunks do NOT repeat the role
for chunk in chunks[1:]:
if chunk.choices:
assert getattr(chunk.choices[0].delta, "role", None) != "assistant", \
f"Unexpected role='assistant' in non-first chunk: {chunk.choices[0].delta}"
Fixes #24221
Relevant issues
Fixes #24221 — LiteLLM proxy doesn't include
rolefor/chat/completionsstream=truewith Azure OpenAI andstream_options.include_usage=trueWhat this PR does
Root cause: In
streaming_handler.pychunk_creator(), whenoriginal_chunkhas no choices (Azure'sprompt_filter_resultschunk) andinclude_usage=True, the code returnsmodel_responsewithout callingstrip_role_from_delta(). This means:rolein its delta__next__/__anext__then setssent_first_chunk=Truerole='assistant'from Azure,strip_role_from_delta()seessent_first_chunk=Trueand strips the roleNet result: no chunk ever has
role='assistant'.Fix: Call
self.strip_role_from_delta(model_response)before returningmodel_responseat line 1559. This is consistent with the other return paths inchunk_creator(lines 895 and 989) that already callstrip_role_from_delta.Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewType
🐛 Bug Fix
Changes
litellm/litellm_core_utils/streaming_handler.py: Changedreturn model_responsetoreturn self.strip_role_from_delta(model_response)in theinclude_usageempty-choices pathtests/test_litellm/litellm_core_utils/test_streaming_handler.py: Addedtest_azure_streaming_role_with_include_usagecovering both sync and async iteration with mock Azure chunks