Skip to content

[Enhancement] Add useTranscription() and useTextToSpeech() React hooks for standalone STT and TTS usage #506

@deepgram-robot

Description

@deepgram-robot

Summary

Add React hooks for standalone Speech-to-Text and Text-to-Speech usage — useTranscription() for streaming STT and useTextToSpeech() for TTS playback — complementing the existing useDeepgramAgent() hook proposal (#477) which covers voice agents only.

Problem it solves

Developers building React applications that use Deepgram for transcription or speech synthesis (not voice agents) currently need to manually manage WebSocket connections, audio capture, playback queues, and component lifecycle cleanup. Every React developer ends up writing the same boilerplate: getUserMedia() → WebSocket connection → event handlers → state updates → cleanup on unmount. Purpose-built hooks for STT and TTS would eliminate this repetitive code, provide proper React lifecycle management, and enable a declarative API for the two most common Deepgram use cases in frontend applications.

Proposed API

// Streaming Speech-to-Text hook
const {
  transcript,        // Current final transcript text
  interimTranscript, // Current interim (in-progress) transcript
  isListening,       // Whether the microphone is active
  startListening,    // Start microphone capture + STT streaming
  stopListening,     // Stop capture and finalize
  error,             // Any connection/capture errors
  metadata,          // Speaker, confidence, language detection
} = useTranscription({
  model: 'nova-3',
  language: 'en',
  smart_format: true,
  diarize: true,
  interim_results: true,
});

// Text-to-Speech hook
const {
  speak,             // (text: string) => Promise<void> — synthesize and play
  isSpeaking,        // Whether audio is currently playing
  stop,              // Stop current playback
  queue,             // Current playback queue length
  error,
} = useTextToSpeech({
  model: 'aura-asteria-en',
});

Both hooks should handle:

  • Provider context via <DeepgramProvider apiKey={key}> wrapper
  • Automatic WebSocket cleanup on unmount
  • Microphone permission handling (STT)
  • Audio playback queue management (TTS)
  • Reconnection on connection loss

Acceptance criteria

  • useTranscription() manages microphone capture and streaming STT lifecycle
  • useTextToSpeech() manages TTS synthesis and audio playback queue
  • Both hooks clean up resources on component unmount
  • Works with <DeepgramProvider> context for API key management
  • TypeScript types exported for all hook return values and options
  • Documented with usage examples in README
  • Compatible with existing SDK API — hooks are additive, not breaking

Raised by the DX intelligence system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions