Skip to content

fix(otel): use completion_tokens_details for Chat Completions API rea…#24112

Merged
RheagalFire merged 4 commits intoBerriAI:litellm_oss_staging_03_21_2026from
AtharvaJaiswal005:fix/otel-completion-tokens-details
Mar 22, 2026
Merged

fix(otel): use completion_tokens_details for Chat Completions API rea…#24112
RheagalFire merged 4 commits intoBerriAI:litellm_oss_staging_03_21_2026from
AtharvaJaiswal005:fix/otel-completion-tokens-details

Conversation

@AtharvaJaiswal005
Copy link
Contributor

Summary

  • Fix _set_usage_outputs to check completion_tokens_details (Chat Completions API) before falling back to output_tokens_details (Responses API)
  • Add prompt_tokens_details extraction for cached_tokens

Root Cause

_set_usage_outputs uses output_tokens_details which only exists on ResponseAPIUsage. Chat Completions API returns completion_tokens_details instead, so reasoning_tokens was always None. Additionally, prompt_tokens_details (cached_tokens) was never extracted.

Changes

  • litellm/integrations/arize/_utils.py: Check completion_tokens_details first, fall back to output_tokens_details. Add prompt_tokens_details/input_tokens_details extraction for cached_tokens.

Fixes #23990

…soning tokens

The OTEL callback looked up output_tokens_details which only exists on
ResponseAPIUsage. Chat Completions API uses completion_tokens_details.
Also add prompt_tokens_details extraction for cached_tokens.

Fixes BerriAI#23990
@vercel
Copy link

vercel bot commented Mar 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 22, 2026 7:57am

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 19, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing AtharvaJaiswal005:fix/otel-completion-tokens-details (1bc0ae3) with main (c89496f)

Open in CodSpeed

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR fixes a real bug in the Arize/Phoenix OTEL integration where reasoning_tokens was never recorded for Chat Completions API responses because _set_usage_outputs only checked output_tokens_details (a Responses API field), and also adds extraction of cached_tokens from prompt_tokens_details/input_tokens_details.

Key changes:

  • litellm/integrations/arize/_utils.py: Checks completion_tokens_details first, falling back to output_tokens_details; switches from .get() to getattr() on the details objects (correct, since CompletionTokensDetailsWrapper extends the OpenAI Pydantic BaseModel which has no .get() method); adds symmetric handling for prompt_tokens_details/input_tokens_detailscached_tokens.
  • tests/test_litellm/integrations/arize/test_arize_utils.py: Adds three new unit tests covering the Chat Completions path, the Responses API fallback path, and the no-details baseline.

Issues found:

  • The input_tokens_details.cached_tokens code path (Responses API) introduced in this PR has no test — only the None case is exercised in the new Responses API test, so a regression in that branch would be silent.

Confidence Score: 4/5

  • Safe to merge; the core fix is correct and well-tested. Minor test coverage gap for the input_tokens_details branch.
  • The implementation is sound: getattr() correctly replaces .get() on Pydantic wrapper objects, the or fallback chain works for both Usage (getattr-based) and ResponseAPIUsage (dict-based) .get() implementations, and the three new tests adequately cover the primary fix. The only gap is the untested input_tokens_details.cached_tokens Responses API path, which is new code without a positive-assertion test.
  • No files require special attention; the implementation change in litellm/integrations/arize/_utils.py is straightforward and correct.

Important Files Changed

Filename Overview
litellm/integrations/arize/_utils.py Correctly fixes the _set_usage_outputs function to check completion_tokens_details (Chat Completions API) before falling back to output_tokens_details (Responses API), and adds prompt_tokens_details/input_tokens_details handling for cached_tokens. Switches from .get() to getattr() on the details objects, which correctly avoids AttributeError on CompletionTokensDetailsWrapper (Pydantic BaseModel without a .get() method). Logic and fallback chain are sound.
tests/test_litellm/integrations/arize/test_arize_utils.py Adds three new tests covering Chat Completions API token details, Responses API fallback, and the no-details baseline. Core fix paths are well-covered. The input_tokens_details.cached_tokens code path (Responses API) is not tested — only None is exercised for that branch.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_set_usage_outputs(span, response_obj, span_attrs)"] --> B{usage present?}
    B -- No --> Z[return]
    B -- Yes --> C[Set total / completion / prompt token counts]
    C --> D["usage.get('completion_tokens_details')"]
    D -- "Non-None (Chat Completions API)" --> E["getattr(..., 'reasoning_tokens', None)"]
    D -- "None → fallback" --> F["usage.get('output_tokens_details')"]
    F -- "Non-None (Responses API)" --> E
    F -- None --> G[Skip reasoning tokens]
    E -- "Non-zero" --> H[Set LLM_TOKEN_COUNT_COMPLETION_DETAILS_REASONING]
    E -- "None / 0" --> G
    C --> I["usage.get('prompt_tokens_details')"]
    I -- "Non-None (Chat Completions API)" --> J["getattr(..., 'cached_tokens', None)"]
    I -- "None → fallback" --> K["usage.get('input_tokens_details')"]
    K -- "Non-None (Responses API)" --> J
    K -- None --> L[Skip cached tokens]
    J -- "Non-zero" --> M[Set LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ]
    J -- "None / 0" --> L
Loading

Last reviewed commit: "test: add tests for ..."

CompletionTokensDetailsWrapper and PromptTokensDetailsWrapper don't
have a .get() method unlike OutputTokensDetails. Use getattr() to
safely access attributes on both dict-like and typed objects.

Addresses Greptile review feedback.
Copy link
Collaborator

@RheagalFire RheagalFire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RheagalFire RheagalFire changed the base branch from main to litellm_oss_staging_03_19_2026 March 20, 2026 17:19
Copy link
Collaborator

@RheagalFire RheagalFire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AtharvaJaiswal005 Let's add tests to itellm/tests/test_litellm/integrations/arize/test_arize_utils.py to ensure the intended change is reflected.

@RheagalFire RheagalFire changed the base branch from litellm_oss_staging_03_19_2026 to main March 20, 2026 17:24
…ls extraction

Adds three tests to verify _set_usage_outputs correctly handles:
- Chat Completions API: completion_tokens_details (reasoning_tokens) and
  prompt_tokens_details (cached_tokens)
- Responses API: fallback to output_tokens_details for reasoning_tokens
- Basic usage: no token details present, no spurious attributes set

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +428 to +463
def test_set_usage_outputs_responses_api_output_tokens_details():
"""
Test that _set_usage_outputs falls back to output_tokens_details (Responses API)
when completion_tokens_details is not present.
"""
from unittest.mock import MagicMock

from litellm.integrations.arize._utils import _set_usage_outputs
from litellm.types.llms.openai import (
OutputTokensDetails,
ResponseAPIUsage,
ResponsesAPIResponse,
)

span = MagicMock()

response_obj = ResponsesAPIResponse(
id="response-456",
created_at=1625247600,
output=[],
usage=ResponseAPIUsage(
input_tokens=100,
output_tokens=200,
total_tokens=300,
output_tokens_details=OutputTokensDetails(reasoning_tokens=150),
),
)

_set_usage_outputs(span, response_obj, SpanAttributes)

span.set_attribute.assert_any_call(SpanAttributes.LLM_TOKEN_COUNT_TOTAL, 300)
span.set_attribute.assert_any_call(SpanAttributes.LLM_TOKEN_COUNT_COMPLETION, 200)
span.set_attribute.assert_any_call(SpanAttributes.LLM_TOKEN_COUNT_PROMPT, 100)
span.set_attribute.assert_any_call(
SpanAttributes.LLM_TOKEN_COUNT_COMPLETION_DETAILS_REASONING, 150
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing coverage for input_tokens_details.cached_tokens (Responses API path)

The new code in _set_usage_outputs falls back to usage.get("input_tokens_details") for cached_tokens when prompt_tokens_details is absent. ResponseAPIUsage has input_tokens_details: Optional[InputTokensDetails], and InputTokensDetails carries cached_tokens: int = 0.

test_set_usage_outputs_responses_api_output_tokens_details only verifies the output_tokens_details branch; input_tokens_details is never set in the test so the new or usage.get("input_tokens_details") branch is never exercised. A future rename of that key or a copy-paste mistake in the fallback chain would pass all tests silently.

Consider extending this test (or adding a dedicated one) that populates input_tokens_details with a non-zero cached_tokens and asserts that LLM_TOKEN_COUNT_PROMPT_DETAILS_CACHE_READ is set to the expected value.

@RheagalFire RheagalFire changed the base branch from main to litellm_oss_staging_03_21_2026 March 22, 2026 08:06
@RheagalFire RheagalFire merged commit 13b893d into BerriAI:litellm_oss_staging_03_21_2026 Mar 22, 2026
38 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: OTEL callback never reports token usage breakdown (reasoning_tokens, cached_tokens) for Chat Completions API

2 participants