Skip to content

[bot] Anthropic tracer silently drops output_tokens_details.thinking_tokens, so extended thinking calls never emit completion_reasoning_tokens #160

@braintrust-bot

Description

@braintrust-bot

What instrumentation is missing

When the Anthropic Messages API is called with extended thinking enabled (thinking: {type: "enabled", budget_tokens: N}), the response usage object includes an output_tokens_details sub-object containing thinking_tokens — the number of tokens consumed by the model's internal reasoning. The Anthropic tracer's parseUsageTokens function never records this, so completion_reasoning_tokens (the standard Braintrust metric for reasoning/thinking token costs) is always absent from Anthropic extended-thinking spans.

Why it's dropped

In trace/contrib/anthropic/traceanthropic.go (lines 114–158), parseUsageTokens iterates over the usage map and only processes values that pass internal.ToInt64:

func parseUsageTokens(usage map[string]interface{}) map[string]int64 {
    for k, v := range usage {
        if ok, i := internal.ToInt64(v); ok {   // only handles scalars
            switch k {
            case "input_tokens":  ...
            case "output_tokens": metrics["completion_tokens"] = i
            // ...
            default: metrics[k] = i
            }
        }
        // nested objects silently skipped — output_tokens_details falls here
    }
}

output_tokens_details is a nested object ({"thinking_tokens": N}), not a scalar, so ToInt64 returns false for it and the entire sub-object is ignored. No completion_reasoning_tokens (or thinking_tokens) metric is ever recorded.

Upstream source

The Anthropic Messages API documentation defines the usage response object as:

"usage": {
  "input_tokens": 45,
  "output_tokens": 170,
  "output_tokens_details": {
    "thinking_tokens": 120
  }
}

output_tokens_details.thinking_tokens is the count of tokens used for the model's extended thinking reasoning, separate from the visible answer tokens. SDK type support was added in anthropic-sdk-go v1.46.0 (May 28, 2026).

Parity gap

Two other integrations in this repo already capture equivalent reasoning/thinking tokens:

Integration API field Braintrust metric
OpenAI (traceopenai.go:136–145) completion_tokens_details.reasoning_tokens completion_reasoning_tokens
Google GenAI (generatecontent.go:360) thoughtsTokenCount completion_reasoning_tokens
Anthropic output_tokens_details.thinking_tokens ❌ never recorded

The OpenAI parseUsageTokens handles *_tokens_details sub-objects explicitly:

if strings.HasSuffix(k, "_tokens_details") {
    prefix := translateMetricPrefix(strings.TrimSuffix(k, "_tokens_details"))
    if details, ok := v.(map[string]interface{}); ok {
        for kd, vd := range details {
            if ok, i := internal.ToInt64(vd); ok {
                metrics[prefix+"_"+kd] = i
            }
        }
    }
}

The Anthropic version has no equivalent branch.

Braintrust docs status

supported — Braintrust's advanced tracing docs (https://www.braintrust.dev/docs/instrument/advanced-tracing) list standard LLM metrics including token counts. The completion_reasoning_tokens metric is already captured by the OpenAI and GenAI integrations, establishing a cross-provider convention that extended-thinking token costs should be separately metered.

Local repo files inspected

  • trace/contrib/anthropic/traceanthropic.goparseUsageTokens() (lines 114–158): only processes int64 scalars; output_tokens_details silently dropped
  • trace/contrib/openai/traceopenai.goparseUsageTokens() (lines 136–145): reference implementation handling *_tokens_details sub-objects → emits completion_reasoning_tokens
  • trace/contrib/genai/generatecontent.go — explicitly maps thoughtsTokenCountcompletion_reasoning_tokens
  • trace/contrib/anthropic/testdata/cassettes/TestStreamingWithThinking.yaml — VCR cassette shows extended thinking streaming; was recorded before output_tokens_details field was added to the API
  • trace/contrib/anthropic/go.modanthropic-sdk-go v1.23.0 (SDK type for OutputTokensDetails not yet present, but middleware parses raw JSON — the API already returns this field for thinking-capable models)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions