Skip to content

fix(genai): attribute tool use tokens in usage metadata #1662

@NelsonGan

Description

@NelsonGan

Package (Required)

  • langchain-google-genai
  • langchain-google-vertexai
  • langchain-google-community
  • Other / not sure / general

Checked other resources

  • I added a descriptive title to this issue
  • I searched the LangChain documentation and API reference (linked above)
  • I used the GitHub search to find a similar issue and didn't find it
  • I am sure this is a bug and not a question or request for help

Example Code (Python)

"""
Reproduce: url_context tokens not attributed in usage_metadata.

When using url_context tool, the Gemini API includes fetched content tokens
in total_token_count but not in prompt_token_count or candidates_token_count.
This causes usage_metadata.input_tokens + output_tokens < total_tokens.

Requires: GOOGLE_API_KEY env var
"""

from langchain_google_genai import ChatGoogleGenerativeAI

model = ChatGoogleGenerativeAI(model="gemini-3-flash-preview")
model_with_url_context = model.bind_tools([{"url_context": {}}])

url = "https://www.newswise.com/articles/strategic-integration-of-llm-compression-toward-optimal-efficiency"
response = model_with_url_context.invoke(f"Visit the link EXPLICITLY and summarize this article: {url}")

usage = response.usage_metadata
input_tokens = usage["input_tokens"]
output_tokens = usage["output_tokens"]
total_tokens = usage["total_tokens"]
gap = total_tokens - input_tokens - output_tokens

print(f"URL: {url}")
print(f"  input={input_tokens}, output={output_tokens}, total={total_tokens}, gap={gap}")

if gap > 0:
    print(f"  BUG: {gap} url_context tokens not attributed in input_tokens or output_tokens")

assert gap == 0, "Unattributed url_context tokens found"

Error Message and Stack Trace (if applicable)

## Before
URL: https://www.newswise.com/articles/strategic-integration-of-llm-compression-toward-optimal-efficiency
  input=42, output=1653, total=5006, gap=3311
  BUG: 3311 url_context tokens not attributed in input_tokens or output_tokens


## After
URL: https://www.newswise.com/articles/strategic-integration-of-llm-compression-toward-optimal-efficiency
  input=2648, output=1506, total=4154, gap=0

Description

When using LangSmith monitoring dashboard, I noticed that there is a huge cost discrepancies between LangSmith & GCP Billing. Therefore, I looked at individual traces, and notice that LLM calls that uses the URL context tool does not sums up the input token usage correctly, causing incorrect cost estimation.

Image

Looking at the above cost breakdown from LangSmith, even though the token sums up correctly to 4.2K, the input token count is 43, resulting in invalid cost estimation for the entire call.

Let me know if I should submit this in the langsmith-sdk, but I believe input tokens should account for tool use token usage in UsageMetadata response.

Created a PR here: #1663

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggenai`langchain-google-genai` package

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions