Skip to content

[bot] OpenAI audio execution APIs (transcription, TTS) not instrumented in Go SDK #156

@braintrust-bot

Description

@braintrust-bot

What instrumentation is missing

The openai-go SDK (v1.12.0) exposes three audio execution API surfaces that are completely untraced. The openaiRouter in trace/contrib/openai/traceopenai.go only dispatches on three path suffixes:

if strings.HasSuffix(path, "/v1/chat/completions") { ... }
if strings.HasSuffix(path, "/v1/responses")        { ... }
if strings.HasSuffix(path, "/v1/embeddings")        { ... }
return nil  // all audio paths fall through here

The untraced generative execution paths are:

Go SDK method HTTP path Description
client.Audio.Transcriptions.New() POST /v1/audio/transcriptions Speech-to-text (Whisper, gpt-4o-transcribe, gpt-4o-mini-transcribe)
client.Audio.Transcriptions.NewStreaming() POST /v1/audio/transcriptions Streaming speech-to-text
client.Audio.Translations.New() POST /v1/audio/translations Audio-to-English translation
client.Audio.Speech.New() POST /v1/audio/speech Text-to-speech synthesis (tts-1, tts-1-hd, gpt-4o-mini-tts)

None of these match the existing suffixes, so all audio calls are silently untraced.

What could be traced

Transcription (POST /v1/audio/transcriptions): request captures model, language, prompt, response_format, temperature; response captures text (transcript) as braintrust.output_json, usage.total_tokens as braintrust.metrics.

TTS (POST /v1/audio/speech): request captures model, voice, input (text), response_format, speed; response is raw audio binary — output summary (e.g. {"audio_format": "mp3"}) or the binary as an attachment could be logged.

The middleware infrastructure already buffers and parses the response body for chat completions; the same approach works for transcriptions (JSON response) and TTS (binary, summary only).

Braintrust docs status

supported for transcription — the Braintrust OpenAI integration docs state:

"Braintrust traces streaming audio transcription calls for sync and async OpenAI clients. Each span captures the audio file as an attachment and the final transcript as the span output."

unclear for TTS and translations — not explicitly mentioned in docs, but TTS is a generative model execution call (the model produces audio output from a text prompt) and would naturally belong in the same surface.

Source: https://www.braintrust.dev/docs/integrations/ai-providers/openai

Upstream sources

Local repo files inspected

  • trace/contrib/openai/traceopenai.goopenaiRouter (lines 104–124): only dispatches on chat completions, responses, and embeddings suffixes; audio paths return nil
  • trace/contrib/openai/chatcompletions.go — reference pattern for request/response parsing
  • trace/contrib/openai/embeddings.go — reference pattern for request metadata + output summary
  • trace/contrib/openai/go.modopenai-go v1.12.0 (Audio services available in this version)
  • trace/contrib/openai/traceopenai_test.go — no tests for audio endpoints
  • examples/internal/openai-v2/main.go — no audio example

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions