Skip to content

Conversation

@GustavoCaso
Copy link

@GustavoCaso GustavoCaso commented Nov 26, 2025

While testing locally Arize, I noticed that when using the acompletion function with stream=True, stream_options={"include_usage": True} options, the reported traces did not included the count token, and the status was undefined.

I chekced the code and noticed that the token count issue, was that we pass the usage_stats object to the _set_token_counts_from_usage directly. When the function checks if the object has the usage attribute, it return false and exit. Causing the span to not report token count.

Screenshot 2025-11-26 at 19 22 51

I modified the code so the object includes the usage attribute, and called the _set_span_status function.

After those changes the token count and the status is reported:

Screenshot 2025-11-26 at 19 25 23

Note

Ensure acompletion/completion streaming reports token counts and sets span status, and add tests covering async streaming and usage reporting.

  • Instrumentation (LiteLLM):
    • Wrap streaming usage with SimpleNamespace(usage=...) before calling _set_token_counts_from_usage in _finalize_sync_streaming_span and _finalize_streaming_span.
    • Set span status via _set_span_status(span, aggregated_output) after async streaming completes.
  • Tests:
    • Add async streaming tests for acompletion validating output value, status OK, and token counts when stream_options={"include_usage": True}.
    • Cover context attributes in both standard and usage-included async streaming paths.

Written by Cursor Bugbot for commit 78956e8. This will update automatically on new commits. Configure here.

@GustavoCaso GustavoCaso requested a review from a team as a code owner November 26, 2025 18:39
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Nov 26, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 26, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@GustavoCaso
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Nov 26, 2025
@GustavoCaso GustavoCaso changed the title ensure litellm async calls with strem reports token usage and span status fix: litellm async calls with strem reports token usage and span status Nov 26, 2025
@GustavoCaso
Copy link
Author

@codefromthecrypt could I get some feedback for theses changes 😄 ?

@codefromthecrypt
Copy link
Contributor

I don't have workflow approval, but I think the main step is to add unit test, or another row in the unit test, for this case.

Main thing is you need to verify the changes are needed, esp the status thing. so you can look at litellm's existing test_completion_sync_streaming which needs to be ported for async

there is some code in litellm for async, but basically always the openai code is in the best shape, as that's the most used. you can steal patterns from this python/instrumentation/openinference-instrumentation-openai/tests/openinference/instrumentation/openai/test_instrumentor.py

then you need to run tests the same way they run in CI

@codefromthecrypt
Copy link
Contributor

ps one tip is that to run the tests, enter the python directory and do uvx --with tox-uv tox run -e py313-ci-litellm this will be the main thing that will ensure CI is happy

@GustavoCaso GustavoCaso force-pushed the gustavocaso/litellm-stream-async-token-count-and-status branch from 8d4c769 to 78956e8 Compare December 7, 2025 10:28
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Dec 7, 2025
@GustavoCaso
Copy link
Author

GustavoCaso commented Dec 7, 2025

Thanks for the feedback @codefromthecrypt.

I added tests 😄 and validated that are passing using uvx --with tox-uv tox run -e py313-ci-litellm

@GustavoCaso GustavoCaso changed the title fix: litellm async calls with strem reports token usage and span status fix: litellm async calls with stream reports token usage and span status Dec 7, 2025
@GustavoCaso
Copy link
Author

@caroger would it be possible to get some feedback on these changes.

Thanks in advance 😄

@caroger caroger self-requested a review December 10, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants