Open
Description
Checked other resources
- I added a very descriptive title to this issue.
- I searched the LangChain documentation with the integrated search.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
Any LLM-call with streaming.
The aggregated token usage is totally wrong and much to high.
See this method:
# Token usage
if left.usage_metadata or any(o.usage_metadata is not None for o in others):
usage_metadata: Optional[UsageMetadata] = left.usage_metadata
for other in others:
usage_metadata = add_usage(usage_metadata, other.usage_metadata)
else:
usage_metadata = None
For streaming we get usage_metdata for each token, e.g.
'input_tokens' = 713
'output_tokens' = 1
'total_tokens' = 714
output_tokens is always 1 and adds up nicely.
input_tokens is always 713 for llm-token-stream and adds up to "input_tokens" * "count(tokens)" (same total_tokens with 714)
This just adds up tokens to huge (totally useless) numbers.
What is the strategy here? Should the llm not report per-token usage metdata and only report this in final chunk? Then Langchain-openai has to change this for that call:
Error Message and Stack Trace (if applicable)
No response
Description
- I'm trying to get sane token usage numbers for streaming with usage_metadata
- I get hugely inflated total_tokens and input_tokens (because multiplied by count(output_token)
- Define a strategy and either adapt the token aggregation in langchain_core.messages.add_ai_message_chunks or the usage reporting only in final chunk in openai.chatmodels.base._create_usage_metadata
System Info
totally not relevant