Description
composer.dictation is not usable in Safari for our custom-backend ChatKit integration. The microphone button renders in the composer and Safari requests microphone permission. After permission is granted, ChatKit briefly enters the recording UI, then the stop button/waveform disappear after about one second and the input returns to text mode. No usable transcription is produced.
This is different from #179, which covered hosted-backend transcription support. This integration uses a custom backend and implements input.transcribe.
Expected behavior
After tapping the microphone button, the composer should remain in dictation mode with the waveform and stop button visible until the user stops recording or the max duration is reached. The recorded audio should then be sent to backend.input.transcribe / input.transcribe.
Actual behavior
On Safari, the UI appears to enter dictation mode briefly, then resets back to text mode. From the user perspective, the waveform/stop UI disappears almost immediately and dictation is non-functional.
Environment
@openai/chatkit-react: 1.5.1
- Transitive
@openai/chatkit: 1.7.0
- ChatKit web component script:
https://cdn.platform.openai.com/deployments/chatkit/chatkit.js
- Backend mode: custom backend using
api.url + domainKey, not hosted Agent Builder backend
- Dictation enabled with:
composer: {
dictation: { enabled: true }
}
Backend support
The custom backend implements input.transcribe and accepts the audio payload shape:
{
type: "input.transcribe",
params: {
audio_base64: string,
mime_type: string
}
}
The backend accepts common MediaRecorder/Safari MIME types, including audio/webm, audio/ogg, audio/mp4, audio/m4a, and audio/wav, then sends the file to OpenAI transcription.
Bundle investigation
The current CDN bundle appears to hardcode dictation MIME selection roughly as:
["audio/webm;codecs=opus", "audio/mp4", "audio/ogg;codecs=opus"]
.find(MediaRecorder.isTypeSupported)
The recorder path also uses:
navigator.mediaDevices.getUserMedia({ audio: true })
new AudioContext() for waveform analysis
new MediaRecorder(stream, { mimeType, audioBitsPerSecond: 24000 })
There does not appear to be a public composer.dictation.mimeType or Safari-specific override in the typed ChatKit options.
Why this matters
Host apps cannot preserve the built-in ChatKit composer dictation UI while working around this externally, because the recorder/waveform UI runs inside ChatKit's iframe. Replacing it with a host-page mic button changes the product UI and loses the built-in waveform/stop behavior.
Requested help
Can ChatKit either:
- handle Safari's MediaRecorder/AudioContext behavior more defensively;
- expose a
composer.dictation.mimeType or MIME preference override;
- emit a structured
chatkit.error/chatkit.log event with the underlying recorder failure;
- or document Safari/iOS support limitations for
composer.dictation?
Description
composer.dictationis not usable in Safari for our custom-backend ChatKit integration. The microphone button renders in the composer and Safari requests microphone permission. After permission is granted, ChatKit briefly enters the recording UI, then the stop button/waveform disappear after about one second and the input returns to text mode. No usable transcription is produced.This is different from #179, which covered hosted-backend transcription support. This integration uses a custom backend and implements
input.transcribe.Expected behavior
After tapping the microphone button, the composer should remain in dictation mode with the waveform and stop button visible until the user stops recording or the max duration is reached. The recorded audio should then be sent to
backend.input.transcribe/input.transcribe.Actual behavior
On Safari, the UI appears to enter dictation mode briefly, then resets back to text mode. From the user perspective, the waveform/stop UI disappears almost immediately and dictation is non-functional.
Environment
@openai/chatkit-react:1.5.1@openai/chatkit:1.7.0https://cdn.platform.openai.com/deployments/chatkit/chatkit.jsapi.url+domainKey, not hosted Agent Builder backendBackend support
The custom backend implements
input.transcribeand accepts the audio payload shape:The backend accepts common MediaRecorder/Safari MIME types, including
audio/webm,audio/ogg,audio/mp4,audio/m4a, andaudio/wav, then sends the file to OpenAI transcription.Bundle investigation
The current CDN bundle appears to hardcode dictation MIME selection roughly as:
The recorder path also uses:
navigator.mediaDevices.getUserMedia({ audio: true })new AudioContext()for waveform analysisnew MediaRecorder(stream, { mimeType, audioBitsPerSecond: 24000 })There does not appear to be a public
composer.dictation.mimeTypeor Safari-specific override in the typed ChatKit options.Why this matters
Host apps cannot preserve the built-in ChatKit composer dictation UI while working around this externally, because the recorder/waveform UI runs inside ChatKit's iframe. Replacing it with a host-page mic button changes the product UI and loses the built-in waveform/stop behavior.
Requested help
Can ChatKit either:
composer.dictation.mimeTypeor MIME preference override;chatkit.error/chatkit.logevent with the underlying recorder failure;composer.dictation?