Package (Required)
Checked other resources
Example Code (Python)
"""
Reproduce: url_context tokens not attributed in usage_metadata.
When using url_context tool, the Gemini API includes fetched content tokens
in total_token_count but not in prompt_token_count or candidates_token_count.
This causes usage_metadata.input_tokens + output_tokens < total_tokens.
Requires: GOOGLE_API_KEY env var
"""
from langchain_google_genai import ChatGoogleGenerativeAI
model = ChatGoogleGenerativeAI(model="gemini-3-flash-preview")
model_with_url_context = model.bind_tools([{"url_context": {}}])
url = "https://www.newswise.com/articles/strategic-integration-of-llm-compression-toward-optimal-efficiency"
response = model_with_url_context.invoke(f"Visit the link EXPLICITLY and summarize this article: {url}")
usage = response.usage_metadata
input_tokens = usage["input_tokens"]
output_tokens = usage["output_tokens"]
total_tokens = usage["total_tokens"]
gap = total_tokens - input_tokens - output_tokens
print(f"URL: {url}")
print(f" input={input_tokens}, output={output_tokens}, total={total_tokens}, gap={gap}")
if gap > 0:
print(f" BUG: {gap} url_context tokens not attributed in input_tokens or output_tokens")
assert gap == 0, "Unattributed url_context tokens found"
Error Message and Stack Trace (if applicable)
## Before
URL: https://www.newswise.com/articles/strategic-integration-of-llm-compression-toward-optimal-efficiency
input=42, output=1653, total=5006, gap=3311
BUG: 3311 url_context tokens not attributed in input_tokens or output_tokens
## After
URL: https://www.newswise.com/articles/strategic-integration-of-llm-compression-toward-optimal-efficiency
input=2648, output=1506, total=4154, gap=0
Description
When using LangSmith monitoring dashboard, I noticed that there is a huge cost discrepancies between LangSmith & GCP Billing. Therefore, I looked at individual traces, and notice that LLM calls that uses the URL context tool does not sums up the input token usage correctly, causing incorrect cost estimation.
Looking at the above cost breakdown from LangSmith, even though the token sums up correctly to 4.2K, the input token count is 43, resulting in invalid cost estimation for the entire call.
Let me know if I should submit this in the langsmith-sdk, but I believe input tokens should account for tool use token usage in UsageMetadata response.
Created a PR here: #1663
Package (Required)
Checked other resources
Example Code (Python)
Error Message and Stack Trace (if applicable)
Description
When using LangSmith monitoring dashboard, I noticed that there is a huge cost discrepancies between LangSmith & GCP Billing. Therefore, I looked at individual traces, and notice that LLM calls that uses the URL context tool does not sums up the input token usage correctly, causing incorrect cost estimation.
Looking at the above cost breakdown from LangSmith, even though the token sums up correctly to 4.2K, the input token count is 43, resulting in invalid cost estimation for the entire call.
Let me know if I should submit this in the langsmith-sdk, but I believe input tokens should account for tool use token usage in UsageMetadata response.
Created a PR here: #1663