Skip to content

[bot] ChatMessage and StreamDelta drop OpenAI audio output (gpt-4o-audio models) #61

@braintrust-bot

Description

@braintrust-bot

Summary

When an OpenAI audio model (gpt-4o-audio-preview, gpt-4o-mini-audio-preview) returns an audio response, the choices[].message.audio object is silently dropped. ChatMessage has no audio field, and StreamDelta has no audio field, so both the audio data and transcript are lost from logged span output. Users cannot see audio transcripts or distinguish audio responses from empty responses in Braintrust traces.

What is missing

OpenAI Chat Completions with modalities: ["text", "audio"] returns an audio object on the assistant message:

Non-streaming response:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "audio": {
        "id": "audio_abc123",
        "data": "<base64-encoded-audio>",
        "expires_at": 1729268602,
        "transcript": "Sure, here is a poem about the ocean..."
      }
    }
  }]
}

Streaming delta:

{"choices": [{"delta": {"audio": {"id": "audio_abc123"}}}]}
{"choices": [{"delta": {"audio": {"data": "base64chunk1..."}}}]}
{"choices": [{"delta": {"audio": {"transcript": "Sure, here"}}}]}
{"choices": [{"delta": {"audio": {"transcript": " is a poem"}}}]}
{"choices": [{"delta": {"audio": {"expires_at": 1729268602}}}]}

Currently in the SDK:

  1. ChatMessage (src/stream.rs) has only role, content, and tool_calls — no audio field. Serde discards unknown fields during deserialization, so the audio object is silently dropped when aggregating non-streaming responses manually or when the output type is constructed.

  2. StreamDelta (src/stream.rs) has only role, content, and tool_calls — no audio field. Incremental delta.audio chunks are never accumulated.

  3. aggregate() builds ChatMessage from accumulated content only. Even if StreamDelta were extended with an audio field, there is no accumulation logic for the transcript string or the chunked base64 data.

This means for audio model responses:

  • The transcript (the only human-readable content in an audio-only response) is lost from the span output
  • The audio id and expires_at are lost
  • The encoded audio data is dropped (expected, since it's large binary data)
  • Spans for audio model calls appear as empty output with no indication of content

Braintrust docs status

unclear — Braintrust's OpenAI integration page documents Chat Completions tracing including streaming, but does not explicitly mention audio output model support. The Braintrust proxy supports routing audio model calls (the proxy accepts any valid OpenAI API call), but SDK-level instrumentation of the audio response field is not documented.

Upstream sources

  • OpenAI audio output guide: https://platform.openai.com/docs/guides/audio — documents modalities: ["text", "audio"], the audio field on assistant messages, and streaming audio chunks
  • OpenAI Chat Completions message object reference: https://platform.openai.com/docs/api-reference/chat/objectchoices[].message.audio field with id, data, expires_at, transcript
  • OpenAI audio models: gpt-4o-audio-preview, gpt-4o-mini-audio-preview — stable, documented, GA
  • OpenAI Python SDK ChatCompletionAudio type defines id, data, expires_at, transcript

Relationship to existing issues

Local files inspected

  • src/stream.rs:312-323ChatMessage struct has role, content, tool_calls; no audio field
  • src/stream.rs:697-705StreamDelta struct has role, content, tool_calls; no audio field
  • src/stream.rs:840-1009aggregate() only accumulates delta.content (string) and delta.tool_calls; no audio transcript accumulation
  • src/stream.rs:1116-1140value_has_content() checks choices array and usage for TTFT detection; audio-only responses (where content is null) would not trigger TTFT recording
  • Full codebase grep for audio, transcript, modality, gpt-4o-audio — zero results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions