fix(otel): use completion_tokens_details for Chat Completions API rea…#24112
Conversation
…soning tokens The OTEL callback looked up output_tokens_details which only exists on ResponseAPIUsage. Chat Completions API uses completion_tokens_details. Also add prompt_tokens_details extraction for cached_tokens. Fixes BerriAI#23990
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a real bug in the Arize/Phoenix OTEL integration where Key changes:
Issues found:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/integrations/arize/_utils.py | Correctly fixes the _set_usage_outputs function to check completion_tokens_details (Chat Completions API) before falling back to output_tokens_details (Responses API), and adds prompt_tokens_details/input_tokens_details handling for cached_tokens. Switches from .get() to getattr() on the details objects, which correctly avoids AttributeError on CompletionTokensDetailsWrapper (Pydantic BaseModel without a .get() method). Logic and fallback chain are sound. |
| tests/test_litellm/integrations/arize/test_arize_utils.py | Adds three new tests covering Chat Completions API token details, Responses API fallback, and the no-details baseline. Core fix paths are well-covered. The input_tokens_details.cached_tokens code path (Responses API) is not tested — only None is exercised for that branch. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_set_usage_outputs(span, response_obj, span_attrs)"] --> B{usage present?}
B -- No --> Z[return]
B -- Yes --> C[Set total / completion / prompt token counts]
C --> D["usage.get('completion_tokens_details')"]
D -- "Non-None (Chat Completions API)" --> E["getattr(..., 'reasoning_tokens', None)"]
D -- "None → fallback" --> F["usage.get('output_tokens_details')"]
F -- "Non-None (Responses API)" --> E
F -- None --> G[Skip reasoning tokens]
E -- "Non-zero" --> H[Set LLM_TOKEN_COUNT_COMPLETION_DETAILS_REASONING]
E -- "None / 0" --> G
C --> I["usage.get('prompt_tokens_details')"]
I -- "Non-None (Chat Completions API)" --> J["getattr(..., 'cached_tokens', None)"]
I -- "None → fallback" --> K["usage.get('input_tokens_details')"]
K -- "Non-None (Responses API)" --> J
K -- None --> L[Skip cached tokens]
J -- "Non-zero" --> M[Set LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ]
J -- "None / 0" --> L
Last reviewed commit: "test: add tests for ..."
CompletionTokensDetailsWrapper and PromptTokensDetailsWrapper don't have a .get() method unlike OutputTokensDetails. Use getattr() to safely access attributes on both dict-like and typed objects. Addresses Greptile review feedback.
There was a problem hiding this comment.
@AtharvaJaiswal005 Let's add tests to itellm/tests/test_litellm/integrations/arize/test_arize_utils.py to ensure the intended change is reflected.
…ls extraction Adds three tests to verify _set_usage_outputs correctly handles: - Chat Completions API: completion_tokens_details (reasoning_tokens) and prompt_tokens_details (cached_tokens) - Responses API: fallback to output_tokens_details for reasoning_tokens - Basic usage: no token details present, no spurious attributes set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| def test_set_usage_outputs_responses_api_output_tokens_details(): | ||
| """ | ||
| Test that _set_usage_outputs falls back to output_tokens_details (Responses API) | ||
| when completion_tokens_details is not present. | ||
| """ | ||
| from unittest.mock import MagicMock | ||
|
|
||
| from litellm.integrations.arize._utils import _set_usage_outputs | ||
| from litellm.types.llms.openai import ( | ||
| OutputTokensDetails, | ||
| ResponseAPIUsage, | ||
| ResponsesAPIResponse, | ||
| ) | ||
|
|
||
| span = MagicMock() | ||
|
|
||
| response_obj = ResponsesAPIResponse( | ||
| id="response-456", | ||
| created_at=1625247600, | ||
| output=[], | ||
| usage=ResponseAPIUsage( | ||
| input_tokens=100, | ||
| output_tokens=200, | ||
| total_tokens=300, | ||
| output_tokens_details=OutputTokensDetails(reasoning_tokens=150), | ||
| ), | ||
| ) | ||
|
|
||
| _set_usage_outputs(span, response_obj, SpanAttributes) | ||
|
|
||
| span.set_attribute.assert_any_call(SpanAttributes.LLM_TOKEN_COUNT_TOTAL, 300) | ||
| span.set_attribute.assert_any_call(SpanAttributes.LLM_TOKEN_COUNT_COMPLETION, 200) | ||
| span.set_attribute.assert_any_call(SpanAttributes.LLM_TOKEN_COUNT_PROMPT, 100) | ||
| span.set_attribute.assert_any_call( | ||
| SpanAttributes.LLM_TOKEN_COUNT_COMPLETION_DETAILS_REASONING, 150 | ||
| ) |
There was a problem hiding this comment.
Missing coverage for
input_tokens_details.cached_tokens (Responses API path)
The new code in _set_usage_outputs falls back to usage.get("input_tokens_details") for cached_tokens when prompt_tokens_details is absent. ResponseAPIUsage has input_tokens_details: Optional[InputTokensDetails], and InputTokensDetails carries cached_tokens: int = 0.
test_set_usage_outputs_responses_api_output_tokens_details only verifies the output_tokens_details branch; input_tokens_details is never set in the test so the new or usage.get("input_tokens_details") branch is never exercised. A future rename of that key or a copy-paste mistake in the fallback chain would pass all tests silently.
Consider extending this test (or adding a dedicated one) that populates input_tokens_details with a non-zero cached_tokens and asserts that LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ is set to the expected value.
13b893d
into
BerriAI:litellm_oss_staging_03_21_2026
Summary
_set_usage_outputsto checkcompletion_tokens_details(Chat Completions API) before falling back tooutput_tokens_details(Responses API)prompt_tokens_detailsextraction forcached_tokensRoot Cause
_set_usage_outputsusesoutput_tokens_detailswhich only exists onResponseAPIUsage. Chat Completions API returnscompletion_tokens_detailsinstead, soreasoning_tokenswas alwaysNone. Additionally,prompt_tokens_details(cached_tokens) was never extracted.Changes
litellm/integrations/arize/_utils.py: Checkcompletion_tokens_detailsfirst, fall back tooutput_tokens_details. Addprompt_tokens_details/input_tokens_detailsextraction forcached_tokens.Fixes #23990