Is there any way to get token usage count after calling the streaming api of open ai to get response #1769

amitsharma101 · 2025-08-09T18:03:29Z

amitsharma101
Aug 9, 2025

I couldn't find any way to get token usage after calling the streaming api. OpenAI returns the last chunk with the usage data but couldn't find any way to get them using instructor. Here's my code snippet

from pydantic import BaseModel, Field

class MyResponse(BaseModel):
    field_1: str = Field(description="My field 1")

resp = self.client.chat.completions.create_partial(
            model='gpt-4.1-2025-04-1',
            messages=messages,
            response_model=MyResponse,
            temperature=0.25,
            max_tokens=4096,
            frequency_penalty=0.5,
            presence_penalty=0.3,
            stream=True,
            stream_options={"include_usage": True},
        )

Here client is

instructor.from_openai(AsyncOpenAI())

aaravriyer193 · 2025-08-18T12:05:34Z

aaravriyer193
Aug 18, 2025

You’re correct — the OpenAI streaming API does return token usage, but only in the final chunk of the stream. That’s why you don’t see it until the very end. When you call with stream_options={"include_usage": True}, the API will append a last delta that looks something like:

{
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  }
}

The issue is that when you’re using instructor with create_partial and stream=True, the library is designed to focus entirely on parsing the model’s output into your Pydantic schema (MyResponse). It doesn’t forward the other metadata (like usage, logprobs, etc.) that OpenAI attaches to the stream. So by the time instructor gives you your parsed object, it has effectively discarded the usage information.

That’s why you couldn’t find a way to access it in your current snippet.

If you need token usage, you have two main paths:

Stream manually, then parse
- Use the raw OpenAI client with stream_options={"include_usage": True}.
- Accumulate the streamed text until the final chunk arrives.
- Capture the usage object from that final chunk.
- Once you have the full text, feed it into instructor (or just directly call your Pydantic model with .parse_raw()).
  This way you get both the structured response and the token counts.
Extend instructor’s stream handling
- Instructor wraps chat.completions.create under the hood.
- You could open a PR or locally patch it so that when include_usage=True, the final usage data is preserved and returned together with the parsed model.
- For example, instead of just returning MyResponse, instructor could return (MyResponse, usage) at the end of the stream.

At the moment, there’s no “magic” way to do resp.usage with create_partial(stream=True). Instructor simply isn’t designed to expose that part of the API.

So if token counts are critical for you (e.g. for logging, billing, or token-budgeting), the safest approach is to handle the raw stream yourself and then validate the output with instructor afterwards.

0 replies

xXMrNidaXx · 2026-02-23T14:01:13Z

xXMrNidaXx
Feb 23, 2026

Getting token usage from streaming OpenAI responses is tricky but doable!

The challenge:
Streaming responses don't include usage by default — it comes at the end.

Solution 1: stream_options parameter

import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    stream_options={"include_usage": True}  # Key!
)

for chunk in response:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="")
    if chunk.usage:  # Last chunk has usage
        print(f"\nTokens: {chunk.usage}")

Solution 2: tiktoken estimation

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4")

full_response = ""
for chunk in stream:
    full_response += chunk.choices[0].delta.content or ""

# Estimate tokens
prompt_tokens = len(enc.encode(prompt))
completion_tokens = len(enc.encode(full_response))

Solution 3: Post-hoc API call

# After streaming, check usage via API
# (Less ideal — extra call)

With Instructor specifically:
The stream_options approach works, but partial model streaming might need custom handling.

We've built token tracking for streaming at RevolutionAI. The stream_options is cleanest.

Did this solve your use case?

0 replies

xXMrNidaXx · 2026-02-23T15:31:41Z

xXMrNidaXx
Feb 23, 2026

Token usage with streaming requires capturing the final chunk. Here are approaches:

1. Use create_with_completion for full response access

from instructor import from_openai

client = from_openai(AsyncOpenAI())

response, completion = await client.chat.completions.create_with_completion(
    model="gpt-4.1-2025-04-14",
    messages=messages,
    response_model=MyResponse,
    stream=True,
    stream_options={"include_usage": True},
)

# Access usage from completion
print(completion.usage)  # CompletionUsage object

2. Manual streaming with usage capture

async def stream_with_usage(client, messages, response_model):
    usage = None
    partial_response = None
    
    async for chunk in client.chat.completions.create_partial(
        model="gpt-4.1-2025-04-14",
        messages=messages,
        response_model=response_model,
        stream=True,
        stream_options={"include_usage": True},
    ):
        partial_response = chunk
        # Check for usage in raw response
        if hasattr(chunk, "_raw_response"):
            raw = chunk._raw_response
            if hasattr(raw, "usage") and raw.usage:
                usage = raw.usage
    
    return partial_response, usage

3. Wrap with custom handler

class UsageTracker:
    def __init__(self):
        self.usage = None
    
    async def stream(self, generator):
        async for chunk in generator:
            yield chunk
        # After stream ends, check last chunk
        if hasattr(chunk, "usage"):
            self.usage = chunk.usage

We track token usage for billing at Revolution AI — the create_with_completion method is the cleanest approach.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there any way to get token usage count after calling the streaming api of open ai to get response #1769

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is there any way to get token usage count after calling the streaming api of open ai to get response #1769

Uh oh!

amitsharma101 Aug 9, 2025

Replies: 3 comments

Uh oh!

aaravriyer193 Aug 18, 2025

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

amitsharma101
Aug 9, 2025

aaravriyer193
Aug 18, 2025

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026