Is there any way to get token usage count after calling the streaming api of open ai to get response #1769
Replies: 3 comments
-
|
You’re correct — the OpenAI streaming API does return token usage, but only in the final chunk of the stream. That’s why you don’t see it until the very end. When you call with {
"usage": {
"prompt_tokens": 123,
"completion_tokens": 456,
"total_tokens": 579
}
}The issue is that when you’re using instructor with That’s why you couldn’t find a way to access it in your current snippet. If you need token usage, you have two main paths:
At the moment, there’s no “magic” way to do So if token counts are critical for you (e.g. for logging, billing, or token-budgeting), the safest approach is to handle the raw stream yourself and then validate the output with instructor afterwards. |
Beta Was this translation helpful? Give feedback.
-
|
Getting token usage from streaming OpenAI responses is tricky but doable! The challenge: Solution 1: stream_options parameter import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
stream_options={"include_usage": True} # Key!
)
for chunk in response:
if chunk.choices:
print(chunk.choices[0].delta.content, end="")
if chunk.usage: # Last chunk has usage
print(f"\nTokens: {chunk.usage}")Solution 2: tiktoken estimation import tiktoken
enc = tiktoken.encoding_for_model("gpt-4")
full_response = ""
for chunk in stream:
full_response += chunk.choices[0].delta.content or ""
# Estimate tokens
prompt_tokens = len(enc.encode(prompt))
completion_tokens = len(enc.encode(full_response))Solution 3: Post-hoc API call # After streaming, check usage via API
# (Less ideal — extra call)With Instructor specifically: We've built token tracking for streaming at RevolutionAI. The Did this solve your use case? |
Beta Was this translation helpful? Give feedback.
-
|
Token usage with streaming requires capturing the final chunk. Here are approaches: 1. Use create_with_completion for full response access from instructor import from_openai
client = from_openai(AsyncOpenAI())
response, completion = await client.chat.completions.create_with_completion(
model="gpt-4.1-2025-04-14",
messages=messages,
response_model=MyResponse,
stream=True,
stream_options={"include_usage": True},
)
# Access usage from completion
print(completion.usage) # CompletionUsage object2. Manual streaming with usage capture async def stream_with_usage(client, messages, response_model):
usage = None
partial_response = None
async for chunk in client.chat.completions.create_partial(
model="gpt-4.1-2025-04-14",
messages=messages,
response_model=response_model,
stream=True,
stream_options={"include_usage": True},
):
partial_response = chunk
# Check for usage in raw response
if hasattr(chunk, "_raw_response"):
raw = chunk._raw_response
if hasattr(raw, "usage") and raw.usage:
usage = raw.usage
return partial_response, usage3. Wrap with custom handler class UsageTracker:
def __init__(self):
self.usage = None
async def stream(self, generator):
async for chunk in generator:
yield chunk
# After stream ends, check last chunk
if hasattr(chunk, "usage"):
self.usage = chunk.usageWe track token usage for billing at Revolution AI — the create_with_completion method is the cleanest approach. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I couldn't find any way to get token usage after calling the streaming api. OpenAI returns the last chunk with the usage data but couldn't find any way to get them using instructor. Here's my code snippet
Here client is
instructor.from_openai(AsyncOpenAI())Beta Was this translation helpful? Give feedback.
All reactions