[bot] Anthropic tracer silently drops `output_tokens_details.thinking_tokens`, so extended thinking calls never emit `completion_reasoning_tokens`

## What instrumentation is missing

When the Anthropic Messages API is called with extended thinking enabled (`thinking: {type: "enabled", budget_tokens: N}`), the response `usage` object includes an `output_tokens_details` sub-object containing `thinking_tokens` — the number of tokens consumed by the model's internal reasoning. The Anthropic tracer's `parseUsageTokens` function never records this, so `completion_reasoning_tokens` (the standard Braintrust metric for reasoning/thinking token costs) is always absent from Anthropic extended-thinking spans.

## Why it's dropped

In `trace/contrib/anthropic/traceanthropic.go` (lines 114–158), `parseUsageTokens` iterates over the usage map and only processes values that pass `internal.ToInt64`:

```go
func parseUsageTokens(usage map[string]interface{}) map[string]int64 {
    for k, v := range usage {
        if ok, i := internal.ToInt64(v); ok {   // only handles scalars
            switch k {
            case "input_tokens":  ...
            case "output_tokens": metrics["completion_tokens"] = i
            // ...
            default: metrics[k] = i
            }
        }
        // nested objects silently skipped — output_tokens_details falls here
    }
}
```

`output_tokens_details` is a nested object (`{"thinking_tokens": N}`), not a scalar, so `ToInt64` returns false for it and the entire sub-object is ignored. No `completion_reasoning_tokens` (or `thinking_tokens`) metric is ever recorded.

## Upstream source

The Anthropic Messages API documentation defines the `usage` response object as:

```json
"usage": {
  "input_tokens": 45,
  "output_tokens": 170,
  "output_tokens_details": {
    "thinking_tokens": 120
  }
}
```

`output_tokens_details.thinking_tokens` is the count of tokens used for the model's extended thinking reasoning, separate from the visible answer tokens. SDK type support was added in `anthropic-sdk-go` v1.46.0 (May 28, 2026).

- Anthropic Messages API reference: https://docs.anthropic.com/en/api/messages
- Anthropic extended thinking guide: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
- `anthropic-sdk-go` v1.46.0 release notes (adds `OutputTokensDetails` type)

## Parity gap

Two other integrations in this repo already capture equivalent reasoning/thinking tokens:

| Integration | API field | Braintrust metric |
|---|---|---|
| **OpenAI** (`traceopenai.go:136–145`) | `completion_tokens_details.reasoning_tokens` | `completion_reasoning_tokens` |
| **Google GenAI** (`generatecontent.go:360`) | `thoughtsTokenCount` | `completion_reasoning_tokens` |
| **Anthropic** | `output_tokens_details.thinking_tokens` | ❌ never recorded |

The OpenAI `parseUsageTokens` handles `*_tokens_details` sub-objects explicitly:

```go
if strings.HasSuffix(k, "_tokens_details") {
    prefix := translateMetricPrefix(strings.TrimSuffix(k, "_tokens_details"))
    if details, ok := v.(map[string]interface{}); ok {
        for kd, vd := range details {
            if ok, i := internal.ToInt64(vd); ok {
                metrics[prefix+"_"+kd] = i
            }
        }
    }
}
```

The Anthropic version has no equivalent branch.

## Braintrust docs status

**supported** — Braintrust's advanced tracing docs (`https://www.braintrust.dev/docs/instrument/advanced-tracing`) list standard LLM metrics including token counts. The `completion_reasoning_tokens` metric is already captured by the OpenAI and GenAI integrations, establishing a cross-provider convention that extended-thinking token costs should be separately metered.

## Local repo files inspected

- `trace/contrib/anthropic/traceanthropic.go` — `parseUsageTokens()` (lines 114–158): only processes int64 scalars; `output_tokens_details` silently dropped
- `trace/contrib/openai/traceopenai.go` — `parseUsageTokens()` (lines 136–145): reference implementation handling `*_tokens_details` sub-objects → emits `completion_reasoning_tokens`
- `trace/contrib/genai/generatecontent.go` — explicitly maps `thoughtsTokenCount` → `completion_reasoning_tokens`
- `trace/contrib/anthropic/testdata/cassettes/TestStreamingWithThinking.yaml` — VCR cassette shows extended thinking streaming; was recorded before `output_tokens_details` field was added to the API
- `trace/contrib/anthropic/go.mod` — `anthropic-sdk-go` v1.23.0 (SDK type for `OutputTokensDetails` not yet present, but middleware parses raw JSON — the API already returns this field for thinking-capable models)

Integration	API field	Braintrust metric
OpenAI (`traceopenai.go:136–145`)	`completion_tokens_details.reasoning_tokens`	`completion_reasoning_tokens`
Google GenAI (`generatecontent.go:360`)	`thoughtsTokenCount`	`completion_reasoning_tokens`
Anthropic	`output_tokens_details.thinking_tokens`	❌ never recorded

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Anthropic tracer silently drops `output_tokens_details.thinking_tokens`, so extended thinking calls never emit `completion_reasoning_tokens` #160

What instrumentation is missing

Why it's dropped

Upstream source

Parity gap

Braintrust docs status

Local repo files inspected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bot] Anthropic tracer silently drops output_tokens_details.thinking_tokens, so extended thinking calls never emit completion_reasoning_tokens #160

Description

What instrumentation is missing

Why it's dropped

Upstream source

Parity gap

Braintrust docs status

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[bot] Anthropic tracer silently drops `output_tokens_details.thinking_tokens`, so extended thinking calls never emit `completion_reasoning_tokens` #160