Skip to content

composer.dictation exits recording UI on Safari before transcription #198

Description

@kylewhirl

Description

composer.dictation is not usable in Safari for our custom-backend ChatKit integration. The microphone button renders in the composer and Safari requests microphone permission. After permission is granted, ChatKit briefly enters the recording UI, then the stop button/waveform disappear after about one second and the input returns to text mode. No usable transcription is produced.

This is different from #179, which covered hosted-backend transcription support. This integration uses a custom backend and implements input.transcribe.

Expected behavior

After tapping the microphone button, the composer should remain in dictation mode with the waveform and stop button visible until the user stops recording or the max duration is reached. The recorded audio should then be sent to backend.input.transcribe / input.transcribe.

Actual behavior

On Safari, the UI appears to enter dictation mode briefly, then resets back to text mode. From the user perspective, the waveform/stop UI disappears almost immediately and dictation is non-functional.

Environment

  • @openai/chatkit-react: 1.5.1
  • Transitive @openai/chatkit: 1.7.0
  • ChatKit web component script: https://cdn.platform.openai.com/deployments/chatkit/chatkit.js
  • Backend mode: custom backend using api.url + domainKey, not hosted Agent Builder backend
  • Dictation enabled with:
composer: {
  dictation: { enabled: true }
}

Backend support

The custom backend implements input.transcribe and accepts the audio payload shape:

{
  type: "input.transcribe",
  params: {
    audio_base64: string,
    mime_type: string
  }
}

The backend accepts common MediaRecorder/Safari MIME types, including audio/webm, audio/ogg, audio/mp4, audio/m4a, and audio/wav, then sends the file to OpenAI transcription.

Bundle investigation

The current CDN bundle appears to hardcode dictation MIME selection roughly as:

["audio/webm;codecs=opus", "audio/mp4", "audio/ogg;codecs=opus"]
  .find(MediaRecorder.isTypeSupported)

The recorder path also uses:

  • navigator.mediaDevices.getUserMedia({ audio: true })
  • new AudioContext() for waveform analysis
  • new MediaRecorder(stream, { mimeType, audioBitsPerSecond: 24000 })

There does not appear to be a public composer.dictation.mimeType or Safari-specific override in the typed ChatKit options.

Why this matters

Host apps cannot preserve the built-in ChatKit composer dictation UI while working around this externally, because the recorder/waveform UI runs inside ChatKit's iframe. Replacing it with a host-page mic button changes the product UI and loses the built-in waveform/stop behavior.

Requested help

Can ChatKit either:

  • handle Safari's MediaRecorder/AudioContext behavior more defensively;
  • expose a composer.dictation.mimeType or MIME preference override;
  • emit a structured chatkit.error/chatkit.log event with the underlying recorder failure;
  • or document Safari/iOS support limitations for composer.dictation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions