Summary
BraintrustStream and wrap_stream_with_span only handle OpenAI Chat Completions streaming chunks (choices[].delta). Cohere's v2 Chat API streaming uses a completely different SSE event format (content-delta, message-end, etc.) with no choices field. All Cohere streaming text and usage metrics are silently discarded: each Cohere event parses as an empty StreamChunk (all fields default to None/[]) and no content, tool calls, or token counts are captured.
This is distinct from #49 (non-streaming Cohere usage extraction), which covers the extract_*_usage gap for non-streaming responses. This issue is specifically about the BraintrustStream aggregation path for streaming responses.
What is missing
Cohere v2 Chat API streaming (endpoint: POST /v2/chat, stream: true) emits SSE events identified by a type field:
// message-start
{"type": "message-start", "id": "abc123", "delta": {"message": {"role": "assistant", "content": []}}}
// content-start
{"type": "content-start", "index": 0, "delta": {"message": {"content": {"type": "text", "text": ""}}}}
// content-delta (one per token)
{"type": "content-delta", "index": 0, "delta": {"message": {"content": {"type": "text", "text": "Hello"}}}}
// content-end
{"type": "content-end", "index": 0}
// message-end (contains usage)
{"type": "message-end", "delta": {
"finish_reason": "COMPLETE",
"usage": {
"billed_units": {"input_tokens": 5, "output_tokens": 26},
"tokens": {"input_tokens": 71, "output_tokens": 26}
}
}}
For tool use, Cohere emits additional event types:
// tool-call-start
{"type": "tool-call-start", "index": 1, "delta": {"message": {"tool_calls": {"id": "tool123", "type": "function", "function": {"name": "get_weather", "arguments": ""}}}}}
// tool-call-delta
{"type": "tool-call-delta", "index": 1, "delta": {"message": {"tool_calls": {"function": {"arguments": "{\"location\": \"NYC\"}"}}}}}
// tool-call-end
{"type": "tool-call-end", "index": 1}
Key structural differences from OpenAI Chat Completions streaming:
- No
choices field: Text content is at delta.message.content.text, not choices[].delta.content
- Usage in
message-end.delta.usage: Token counts are nested inside message-end.delta.usage.billed_units and message-end.delta.usage.tokens, not at a root usage field
- Dual usage objects:
billed_units (billable tokens) differs from tokens (actual processed tokens, including internal overhead) — a distinction not present in OpenAI or Anthropic formats
- Tool calls in
tool-call-start/delta/end: Tool arguments streamed via delta.message.tool_calls.function.arguments, structurally different from choices[].delta.tool_calls
type discriminant key: Events are typed via a top-level type field, not wrapped in a discriminant object like Bedrock ConverseStream
Failure mode in current SDK
StreamChunk (src/stream.rs:687-694) is defined with all #[serde(default)] fields:
struct StreamChunk {
#[serde(default)]
model: Option<String>,
#[serde(default)]
choices: Vec<StreamChoice>,
#[serde(default)]
usage: Option<StreamUsage>,
}
Because serde ignores unknown fields by default, serde_json::from_value on any Cohere v2 streaming event succeeds — producing a StreamChunk with model: None, choices: [], and usage: None. The Err(_) => continue fallback at line 856 is never hit. Every chunk processes without error but all content and metrics are silently dropped:
- Text output from
content-delta.delta.message.content.text is lost (no choices field)
- Tool call arguments from
tool-call-delta.delta.message.tool_calls.function.arguments are lost
- Usage metrics from
message-end.delta.usage are lost (nested under delta, not at root usage)
- Billed vs. actual token distinction (
billed_units vs. tokens) is never captured
- Finish reason from
message-end.delta.finish_reason is lost
- TTFT metric is not recorded (
value_has_content() at src/stream.rs:1117-1119 checks for non-empty choices, always empty for Cohere events)
Braintrust docs status
supported — Braintrust documents Cohere instrumentation including streaming behavior: "instruments the native Cohere Python SDK so you can inspect prompts, responses, streaming behavior, embeddings, and rerank calls in Braintrust." Other Braintrust SDKs (Python: wrap_cohere(), TypeScript: wrapCohere()) handle Cohere streaming correctly in those languages. The Rust SDK has no equivalent.
Upstream sources
Relationship to existing issues
Local files inspected
src/stream.rs:687-694 — StreamChunk struct has only model, choices, usage with #[serde(default)]; Cohere events have a type field and nested delta structure, none of which match
src/stream.rs:840-857 — aggregate() calls serde_json::from_value; Cohere events silently deserialize to empty StreamChunk objects without hitting the Err(_) => continue fallback
src/stream.rs:1117-1119 — value_has_content() checks choices array; always empty for Cohere events, so TTFT is never recorded
src/extractors.rs — extract_openai_usage() looks for root usage.prompt_tokens; extract_anthropic_usage() looks for root usage.input_tokens; neither handles Cohere's message-end.delta.usage.billed_units / tokens
src/lib.rs — public API exports; no Cohere references
- Full codebase grep for
cohere, billed_units, content-delta, message-end, tool-call-start — zero results
Summary
BraintrustStreamandwrap_stream_with_spanonly handle OpenAI Chat Completions streaming chunks (choices[].delta). Cohere's v2 Chat API streaming uses a completely different SSE event format (content-delta,message-end, etc.) with nochoicesfield. All Cohere streaming text and usage metrics are silently discarded: each Cohere event parses as an emptyStreamChunk(all fields default toNone/[]) and no content, tool calls, or token counts are captured.This is distinct from #49 (non-streaming Cohere usage extraction), which covers the
extract_*_usagegap for non-streaming responses. This issue is specifically about theBraintrustStreamaggregation path for streaming responses.What is missing
Cohere v2 Chat API streaming (endpoint:
POST /v2/chat,stream: true) emits SSE events identified by atypefield:For tool use, Cohere emits additional event types:
Key structural differences from OpenAI Chat Completions streaming:
choicesfield: Text content is atdelta.message.content.text, notchoices[].delta.contentmessage-end.delta.usage: Token counts are nested insidemessage-end.delta.usage.billed_unitsandmessage-end.delta.usage.tokens, not at a rootusagefieldbilled_units(billable tokens) differs fromtokens(actual processed tokens, including internal overhead) — a distinction not present in OpenAI or Anthropic formatstool-call-start/delta/end: Tool arguments streamed viadelta.message.tool_calls.function.arguments, structurally different fromchoices[].delta.tool_callstypediscriminant key: Events are typed via a top-leveltypefield, not wrapped in a discriminant object like Bedrock ConverseStreamFailure mode in current SDK
StreamChunk(src/stream.rs:687-694) is defined with all#[serde(default)]fields:Because serde ignores unknown fields by default,
serde_json::from_valueon any Cohere v2 streaming event succeeds — producing aStreamChunkwithmodel: None,choices: [], andusage: None. TheErr(_) => continuefallback at line 856 is never hit. Every chunk processes without error but all content and metrics are silently dropped:content-delta.delta.message.content.textis lost (nochoicesfield)tool-call-delta.delta.message.tool_calls.function.argumentsare lostmessage-end.delta.usageare lost (nested underdelta, not at rootusage)billed_unitsvs.tokens) is never capturedmessage-end.delta.finish_reasonis lostvalue_has_content()atsrc/stream.rs:1117-1119checks for non-emptychoices, always empty for Cohere events)Braintrust docs status
supported — Braintrust documents Cohere instrumentation including streaming behavior: "instruments the native Cohere Python SDK so you can inspect prompts, responses, streaming behavior, embeddings, and rerank calls in Braintrust." Other Braintrust SDKs (Python:
wrap_cohere(), TypeScript:wrapCohere()) handle Cohere streaming correctly in those languages. The Rust SDK has no equivalent.Upstream sources
content-delta,message-end, tool-call events): https://docs.cohere.com/v2/reference/chat-streamRelationship to existing issues
extract_cohere_usage()for non-streaming Cohere responses where usage is inresponse.usage.billed_units. This issue covers the streaming path (BraintrustStream,wrap_stream_with_span) where usage is inmessage-end.delta.usage.billed_unitsand text content comes fromcontent-deltaevents — a completely different streaming schema.BraintrustStream) for different providers.Local files inspected
src/stream.rs:687-694—StreamChunkstruct has onlymodel,choices,usagewith#[serde(default)]; Cohere events have atypefield and nesteddeltastructure, none of which matchsrc/stream.rs:840-857—aggregate()callsserde_json::from_value; Cohere events silently deserialize to emptyStreamChunkobjects without hitting theErr(_) => continuefallbacksrc/stream.rs:1117-1119—value_has_content()checkschoicesarray; always empty for Cohere events, so TTFT is never recordedsrc/extractors.rs—extract_openai_usage()looks for rootusage.prompt_tokens;extract_anthropic_usage()looks for rootusage.input_tokens; neither handles Cohere'smessage-end.delta.usage.billed_units/tokenssrc/lib.rs— public API exports; no Cohere referencescohere,billed_units,content-delta,message-end,tool-call-start— zero results