Summary
Add React hooks for standalone Speech-to-Text and Text-to-Speech usage — useTranscription() for streaming STT and useTextToSpeech() for TTS playback — complementing the existing useDeepgramAgent() hook proposal (#477) which covers voice agents only.
Problem it solves
Developers building React applications that use Deepgram for transcription or speech synthesis (not voice agents) currently need to manually manage WebSocket connections, audio capture, playback queues, and component lifecycle cleanup. Every React developer ends up writing the same boilerplate: getUserMedia() → WebSocket connection → event handlers → state updates → cleanup on unmount. Purpose-built hooks for STT and TTS would eliminate this repetitive code, provide proper React lifecycle management, and enable a declarative API for the two most common Deepgram use cases in frontend applications.
Proposed API
// Streaming Speech-to-Text hook
const {
transcript, // Current final transcript text
interimTranscript, // Current interim (in-progress) transcript
isListening, // Whether the microphone is active
startListening, // Start microphone capture + STT streaming
stopListening, // Stop capture and finalize
error, // Any connection/capture errors
metadata, // Speaker, confidence, language detection
} = useTranscription({
model: 'nova-3',
language: 'en',
smart_format: true,
diarize: true,
interim_results: true,
});
// Text-to-Speech hook
const {
speak, // (text: string) => Promise<void> — synthesize and play
isSpeaking, // Whether audio is currently playing
stop, // Stop current playback
queue, // Current playback queue length
error,
} = useTextToSpeech({
model: 'aura-asteria-en',
});
Both hooks should handle:
- Provider context via
<DeepgramProvider apiKey={key}> wrapper
- Automatic WebSocket cleanup on unmount
- Microphone permission handling (STT)
- Audio playback queue management (TTS)
- Reconnection on connection loss
Acceptance criteria
Raised by the DX intelligence system.
Summary
Add React hooks for standalone Speech-to-Text and Text-to-Speech usage —
useTranscription()for streaming STT anduseTextToSpeech()for TTS playback — complementing the existinguseDeepgramAgent()hook proposal (#477) which covers voice agents only.Problem it solves
Developers building React applications that use Deepgram for transcription or speech synthesis (not voice agents) currently need to manually manage WebSocket connections, audio capture, playback queues, and component lifecycle cleanup. Every React developer ends up writing the same boilerplate:
getUserMedia()→ WebSocket connection → event handlers → state updates → cleanup on unmount. Purpose-built hooks for STT and TTS would eliminate this repetitive code, provide proper React lifecycle management, and enable a declarative API for the two most common Deepgram use cases in frontend applications.Proposed API
Both hooks should handle:
<DeepgramProvider apiKey={key}>wrapperAcceptance criteria
useTranscription()manages microphone capture and streaming STT lifecycleuseTextToSpeech()manages TTS synthesis and audio playback queue<DeepgramProvider>context for API key managementRaised by the DX intelligence system.