Skip to content

[bot] BraintrustStream cannot aggregate Cohere v2 chat streaming events #65

@braintrust-bot

Description

@braintrust-bot

Summary

BraintrustStream and wrap_stream_with_span only handle OpenAI Chat Completions streaming chunks (choices[].delta). Cohere's v2 Chat API streaming uses a completely different SSE event format (content-delta, message-end, etc.) with no choices field. All Cohere streaming text and usage metrics are silently discarded: each Cohere event parses as an empty StreamChunk (all fields default to None/[]) and no content, tool calls, or token counts are captured.

This is distinct from #49 (non-streaming Cohere usage extraction), which covers the extract_*_usage gap for non-streaming responses. This issue is specifically about the BraintrustStream aggregation path for streaming responses.

What is missing

Cohere v2 Chat API streaming (endpoint: POST /v2/chat, stream: true) emits SSE events identified by a type field:

// message-start
{"type": "message-start", "id": "abc123", "delta": {"message": {"role": "assistant", "content": []}}}

// content-start
{"type": "content-start", "index": 0, "delta": {"message": {"content": {"type": "text", "text": ""}}}}

// content-delta (one per token)
{"type": "content-delta", "index": 0, "delta": {"message": {"content": {"type": "text", "text": "Hello"}}}}

// content-end
{"type": "content-end", "index": 0}

// message-end (contains usage)
{"type": "message-end", "delta": {
  "finish_reason": "COMPLETE",
  "usage": {
    "billed_units": {"input_tokens": 5, "output_tokens": 26},
    "tokens": {"input_tokens": 71, "output_tokens": 26}
  }
}}

For tool use, Cohere emits additional event types:

// tool-call-start
{"type": "tool-call-start", "index": 1, "delta": {"message": {"tool_calls": {"id": "tool123", "type": "function", "function": {"name": "get_weather", "arguments": ""}}}}}

// tool-call-delta
{"type": "tool-call-delta", "index": 1, "delta": {"message": {"tool_calls": {"function": {"arguments": "{\"location\": \"NYC\"}"}}}}}

// tool-call-end
{"type": "tool-call-end", "index": 1}

Key structural differences from OpenAI Chat Completions streaming:

  1. No choices field: Text content is at delta.message.content.text, not choices[].delta.content
  2. Usage in message-end.delta.usage: Token counts are nested inside message-end.delta.usage.billed_units and message-end.delta.usage.tokens, not at a root usage field
  3. Dual usage objects: billed_units (billable tokens) differs from tokens (actual processed tokens, including internal overhead) — a distinction not present in OpenAI or Anthropic formats
  4. Tool calls in tool-call-start/delta/end: Tool arguments streamed via delta.message.tool_calls.function.arguments, structurally different from choices[].delta.tool_calls
  5. type discriminant key: Events are typed via a top-level type field, not wrapped in a discriminant object like Bedrock ConverseStream

Failure mode in current SDK

StreamChunk (src/stream.rs:687-694) is defined with all #[serde(default)] fields:

struct StreamChunk {
    #[serde(default)]
    model: Option<String>,
    #[serde(default)]
    choices: Vec<StreamChoice>,
    #[serde(default)]
    usage: Option<StreamUsage>,
}

Because serde ignores unknown fields by default, serde_json::from_value on any Cohere v2 streaming event succeeds — producing a StreamChunk with model: None, choices: [], and usage: None. The Err(_) => continue fallback at line 856 is never hit. Every chunk processes without error but all content and metrics are silently dropped:

  • Text output from content-delta.delta.message.content.text is lost (no choices field)
  • Tool call arguments from tool-call-delta.delta.message.tool_calls.function.arguments are lost
  • Usage metrics from message-end.delta.usage are lost (nested under delta, not at root usage)
  • Billed vs. actual token distinction (billed_units vs. tokens) is never captured
  • Finish reason from message-end.delta.finish_reason is lost
  • TTFT metric is not recorded (value_has_content() at src/stream.rs:1117-1119 checks for non-empty choices, always empty for Cohere events)

Braintrust docs status

supported — Braintrust documents Cohere instrumentation including streaming behavior: "instruments the native Cohere Python SDK so you can inspect prompts, responses, streaming behavior, embeddings, and rerank calls in Braintrust." Other Braintrust SDKs (Python: wrap_cohere(), TypeScript: wrapCohere()) handle Cohere streaming correctly in those languages. The Rust SDK has no equivalent.

Upstream sources

Relationship to existing issues

Local files inspected

  • src/stream.rs:687-694StreamChunk struct has only model, choices, usage with #[serde(default)]; Cohere events have a type field and nested delta structure, none of which match
  • src/stream.rs:840-857aggregate() calls serde_json::from_value; Cohere events silently deserialize to empty StreamChunk objects without hitting the Err(_) => continue fallback
  • src/stream.rs:1117-1119value_has_content() checks choices array; always empty for Cohere events, so TTFT is never recorded
  • src/extractors.rsextract_openai_usage() looks for root usage.prompt_tokens; extract_anthropic_usage() looks for root usage.input_tokens; neither handles Cohere's message-end.delta.usage.billed_units / tokens
  • src/lib.rs — public API exports; no Cohere references
  • Full codebase grep for cohere, billed_units, content-delta, message-end, tool-call-start — zero results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions