Summary
BraintrustStream and wrap_stream_with_span only handle OpenAI Chat Completions streaming chunks (choices[].delta). The OpenAI Responses API (GA March 2025) uses a completely different streaming event format — typed server-sent events such as response.output_text.delta, response.function_call_arguments_delta, and response.completed. All Responses API streaming events are silently discarded: each event parses as an empty StreamChunk (all fields default to None/[]) and no output text, tool call arguments, model name, usage metrics, or TTFT are captured.
This is distinct from #44 (async-openai client wrapper), which covers adding a wrapper around the async-openai library. This issue is specifically about the BraintrustStream aggregation path, which is a provider-format-agnostic surface that users call directly.
What is missing
The OpenAI Responses API streaming emits typed events, each with a type field. Key examples:
// Text delta (not choices[].delta.content)
{"type": "response.output_text.delta", "item_id": "msg_abc", "output_index": 0, "content_index": 0, "delta": "Hello, how can I help?"}
// Tool call arguments delta (not choices[].delta.tool_calls)
{"type": "response.function_call_arguments_delta", "item_id": "fc_xyz", "output_index": 1, "call_id": "call_001", "delta": "{\"location\": \"NYC\"}"}
// Reasoning summary delta (for o1/o3/o4 models)
{"type": "response.reasoning_summary_text.delta", "item_id": "rs_abc", "output_index": 0, "summary_index": 0, "delta": "Let me think through this..."}
// Final event — usage is nested under "response", not at root
{
"type": "response.completed",
"response": {
"id": "resp_001",
"model": "gpt-4o-2024-11-20",
"output": [{"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Hello, how can I help?"}]}],
"usage": {
"input_tokens": 50,
"output_tokens": 25,
"total_tokens": 75,
"output_tokens_details": {"reasoning_tokens": 0}
}
}
}
Key structural differences from Chat Completions streaming:
- No
choices field: Text content is at delta (a string), not choices[].delta.content
- No root-level
model: The model name is only present in the final response.completed event, nested under response.model
- No root-level
usage: Token counts are in response.completed.response.usage, not at the top-level usage key
- Typed events: Each event has a
type discriminant; content and tool calls are separate event types
- New tool types: Built-in tools (
response.file_search_call_*, response.code_interpreter_call_*, response.mcp_call_*) have no equivalent in Chat Completions streaming
Failure mode in current SDK
StreamChunk (src/stream.rs:687-694) is defined with all #[serde(default)] fields:
struct StreamChunk {
#[serde(default)]
model: Option<String>,
#[serde(default)]
choices: Vec<StreamChoice>,
#[serde(default)]
usage: Option<StreamUsage>,
}
Because serde ignores unknown fields by default (no #[serde(deny_unknown_fields)]), serde_json::from_value on any Responses API event succeeds — but produces a StreamChunk with model: None, choices: [], and usage: None. The Err(_) => continue fallback at line 856 is never hit. Every chunk processes without error but all content and metrics are silently dropped:
- Text output from
response.output_text.delta events is lost (no choices field)
- Tool call arguments from
response.function_call_arguments_delta are lost
- Reasoning summary from
response.reasoning_summary_text.delta is lost
- Model name from
response.completed.response.model is lost (nested under response, not root)
- Usage metrics (
input_tokens, output_tokens, output_tokens_details.reasoning_tokens) from response.completed.response.usage are never extracted (nested under response, not at root usage key)
- Finish reason is not captured
- TTFT metric is not recorded (
value_has_content() at src/stream.rs:1117-1119 checks for non-empty choices, always empty for Responses API events)
Braintrust docs status
unclear — Braintrust documents OpenAI instrumentation (wrapOpenAI in TypeScript, wrap_openai in Python) but does not explicitly mention Responses API support for the Rust SDK. The OpenAI integration page focuses on Chat Completions and does not address the Responses API. Rust is not listed as a supported language for automatic LLM call tracing on the Trace LLM calls page.
Upstream sources
Relationship to existing issues
Local files inspected
src/stream.rs:687-694 — StreamChunk struct has only model, choices, usage with #[serde(default)]; Responses API events have a type field and delta string, none of which match
src/stream.rs:840-857 — aggregate() calls serde_json::from_value; Responses API events silently deserialize to empty StreamChunk objects without hitting the Err(_) => continue fallback
src/stream.rs:1117-1119 — value_has_content() checks choices array; always empty for Responses API events, so TTFT is never recorded
src/extractors.rs — extract_openai_usage() calls value.get("usage") at line 5; response.completed wraps usage under response.usage not at root, so extraction would return UsageMetrics::default() even if the final event were parsed
src/lib.rs — public API exports; no Responses API references
- Full codebase grep for
response.output_text, response.completed, ResponseOutputText, output_index, summary_index — zero results
Summary
BraintrustStreamandwrap_stream_with_spanonly handle OpenAI Chat Completions streaming chunks (choices[].delta). The OpenAI Responses API (GA March 2025) uses a completely different streaming event format — typed server-sent events such asresponse.output_text.delta,response.function_call_arguments_delta, andresponse.completed. All Responses API streaming events are silently discarded: each event parses as an emptyStreamChunk(all fields default toNone/[]) and no output text, tool call arguments, model name, usage metrics, or TTFT are captured.This is distinct from #44 (async-openai client wrapper), which covers adding a wrapper around the
async-openailibrary. This issue is specifically about theBraintrustStreamaggregation path, which is a provider-format-agnostic surface that users call directly.What is missing
The OpenAI Responses API streaming emits typed events, each with a
typefield. Key examples:Key structural differences from Chat Completions streaming:
choicesfield: Text content is atdelta(a string), notchoices[].delta.contentmodel: The model name is only present in the finalresponse.completedevent, nested underresponse.modelusage: Token counts are inresponse.completed.response.usage, not at the top-levelusagekeytypediscriminant; content and tool calls are separate event typesresponse.file_search_call_*,response.code_interpreter_call_*,response.mcp_call_*) have no equivalent in Chat Completions streamingFailure mode in current SDK
StreamChunk(src/stream.rs:687-694) is defined with all#[serde(default)]fields:Because serde ignores unknown fields by default (no
#[serde(deny_unknown_fields)]),serde_json::from_valueon any Responses API event succeeds — but produces aStreamChunkwithmodel: None,choices: [], andusage: None. TheErr(_) => continuefallback at line 856 is never hit. Every chunk processes without error but all content and metrics are silently dropped:response.output_text.deltaevents is lost (nochoicesfield)response.function_call_arguments_deltaare lostresponse.reasoning_summary_text.deltais lostresponse.completed.response.modelis lost (nested underresponse, not root)input_tokens,output_tokens,output_tokens_details.reasoning_tokens) fromresponse.completed.response.usageare never extracted (nested underresponse, not at rootusagekey)value_has_content()atsrc/stream.rs:1117-1119checks for non-emptychoices, always empty for Responses API events)Braintrust docs status
unclear — Braintrust documents OpenAI instrumentation (
wrapOpenAIin TypeScript,wrap_openaiin Python) but does not explicitly mention Responses API support for the Rust SDK. The OpenAI integration page focuses on Chat Completions and does not address the Responses API. Rust is not listed as a supported language for automatic LLM call tracing on the Trace LLM calls page.Upstream sources
choices→outputarray): https://developers.openai.com/api/docs/guides/migrate-to-responsesResponseOutputTextDeltaEvent,ResponseCompletedEvent,ResponseFunctionCallArgumentsDeltaEvent): https://github.com/openai/openai-node/blob/master/src/resources/responses/responses.tsRelationship to existing issues
wrap_openai-style wrapper around theasync-openailibrary. This issue covers theBraintrustStream/wrap_stream_with_spanstreaming format aggregation gap, which affects any user calling the Responses API and feeding the resulting stream towrap_stream_with_spandirectly — independent of client library.BraintrustStream) for different providers. This issue covers the OpenAI Responses API as the same class of problem from the same upstream vendor whose Chat Completions format is already supported.Local files inspected
src/stream.rs:687-694—StreamChunkstruct has onlymodel,choices,usagewith#[serde(default)]; Responses API events have atypefield anddeltastring, none of which matchsrc/stream.rs:840-857—aggregate()callsserde_json::from_value; Responses API events silently deserialize to emptyStreamChunkobjects without hitting theErr(_) => continuefallbacksrc/stream.rs:1117-1119—value_has_content()checkschoicesarray; always empty for Responses API events, so TTFT is never recordedsrc/extractors.rs—extract_openai_usage()callsvalue.get("usage")at line 5;response.completedwraps usage underresponse.usagenot at root, so extraction would returnUsageMetrics::default()even if the final event were parsedsrc/lib.rs— public API exports; no Responses API referencesresponse.output_text,response.completed,ResponseOutputText,output_index,summary_index— zero results