Skip to content

[bot] BraintrustStream cannot aggregate AWS Bedrock ConverseStream events #64

@braintrust-bot

Description

@braintrust-bot

Summary

BraintrustStream and wrap_stream_with_span only handle OpenAI Chat Completions streaming chunks (choices[].delta). AWS Bedrock's ConverseStream API emits a completely different event structure (contentBlockDelta, messageStop, metadata) with no choices field. All Bedrock streaming content and usage metrics are silently discarded: Bedrock events parse as empty StreamChunk objects (all fields default to None/[]) and no content, tool use, or token counts are captured.

This is distinct from #52 (non-streaming Bedrock Converse usage extraction), which covers the extract_*_usage gap for non-streaming responses. This issue is specifically about the BraintrustStream aggregation path for streaming responses.

What is missing

AWS Bedrock ConverseStream emits a sequence of typed events, each wrapped in a discriminant key:

// messageStart
{"messageStart": {"role": "assistant"}}

// contentBlockStart (text)
{"contentBlockStart": {"contentBlockIndex": 0, "start": {}}}

// contentBlockDelta (text)
{"contentBlockDelta": {"contentBlockIndex": 0, "delta": {"text": "Hello, I can help"}}}

// contentBlockStop
{"contentBlockStop": {"contentBlockIndex": 0}}

// contentBlockStart (tool use)
{"contentBlockStart": {"contentBlockIndex": 1, "start": {"toolUse": {"toolUseId": "tool123", "name": "get_weather"}}}}

// contentBlockDelta (tool use arguments)
{"contentBlockDelta": {"contentBlockIndex": 1, "delta": {"toolUse": {"input": "{\"location\": \"NYC\"}"}}}}

// messageStop
{"messageStop": {"stopReason": "end_turn"}}

// metadata (usage + latency)
{"metadata": {
  "usage": {"inputTokens": 30, "outputTokens": 50, "totalTokens": 80},
  "metrics": {"latencyMs": 1275}
}}

Key structural differences from OpenAI Chat Completions streaming format:

  1. No choices field: Text content lives in contentBlockDelta.delta.text, not in choices[].delta.content
  2. Token usage in nested metadata: Usage is at metadata.usage.inputTokens/outputTokens/totalTokens (camelCase), not at the root usage.prompt_tokens/completion_tokens
  3. Tool use in contentBlockStart/contentBlockDelta: Tool invocations use contentBlockStart.start.toolUse and contentBlockDelta.delta.toolUse, not choices[].delta.tool_calls
  4. Stop reason in messageStop: Finish reason is at messageStop.stopReason, not choices[].finish_reason
  5. Latency in metadata.metrics: Server-side latency latencyMs is available — not present in any other provider's streaming format

Failure mode in current SDK

StreamChunk (src/stream.rs:687-694) is defined with all #[serde(default)] fields:

struct StreamChunk {
    #[serde(default)]
    model: Option<String>,
    #[serde(default)]
    choices: Vec<StreamChoice>,
    #[serde(default)]
    usage: Option<StreamUsage>,
}

Because serde ignores unknown fields by default (no #[serde(deny_unknown_fields)]), serde_json::from_value on any Bedrock ConverseStream event succeeds — but produces a StreamChunk with model: None, choices: [], and usage: None. The Err(_) => continue fallback at line 856 is never hit, but every chunk is effectively empty. This means:

  • Text output from contentBlockDelta.delta.text is lost
  • Tool use name and arguments from contentBlockStart/contentBlockDelta.delta.toolUse are lost
  • Usage metrics (inputTokens, outputTokens, totalTokens) from metadata.usage are never extracted (they're nested under metadata, not at the root usage key)
  • Stop reason from messageStop.stopReason is lost
  • TTFT metric is not recorded (value_has_content() at src/stream.rs:1117-1119 checks for non-empty choices array, which is always empty for Bedrock events)
  • Server-side latency from metadata.metrics.latencyMs is never captured

Braintrust docs status

supported — Braintrust explicitly documents AWS Bedrock tracing: "Converse, ConverseStream, and InvokeModel calls are traced." Other Braintrust SDKs (Go) already provide Bedrock Runtime middleware that captures token usage including cache tokens. The Rust SDK has no equivalent streaming support.

Upstream sources

Relationship to existing issues

Local files inspected

  • src/stream.rs:687-694StreamChunk struct has only model, choices, usage with #[serde(default)]; Bedrock events have none of these keys at root level, so all parse as empty structs
  • src/stream.rs:840-857aggregate() calls serde_json::from_value; Bedrock events silently deserialize to empty StreamChunk objects without hitting the Err(_) => continue fallback
  • src/stream.rs:1117-1119value_has_content() checks choices array; always empty for Bedrock events, so TTFT is never recorded
  • src/extractors.rsextract_openai_usage() looks for root usage.prompt_tokens; extract_anthropic_usage() looks for root usage.input_tokens; neither matches Bedrock's metadata.usage.inputTokens
  • src/lib.rs — public API exports; no Bedrock references
  • Full codebase grep for bedrock, ConverseStream, contentBlockDelta, messageStart, inputTokens — zero results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions