Voice Input Setup Guide

Overview

The voice input functionality has been fully implemented with the following features:

🎤 Mic button appears when input is empty (replaces send button)
🌊 Real-time waveform visualization during recording
⏱️ Recording timer display
🛑 Stop button positioned in bottom right
🔄 Transcribing state with loading indicator
📝 Automatic text insertion into prompt box
🔐 Secure backend API with user authentication

Required Configuration

1. Set up Groq API Key

The voice input uses Groq's Whisper API for speech-to-text transcription. You need to:

Get a Groq API Key:
- Visit Groq Console
- Create an account or sign in
- Generate an API key
Configure the API Key in Convex:

For Development:
```
npx convex env set GROQ_API_KEY your-actual-groq-api-key-here
```
For Production:
```
npx convex env set GROQ_API_KEY your-actual-groq-api-key-here --prod
```
Or via Convex Dashboard:
- Go to your Convex Dashboard
- Navigate to your project's Deployment Settings
- Add environment variable: GROQ_API_KEY with your API key value

Testing the Feature

Start your development server (if not already running):
```
bun run dev
```
Test the voice input:
- Open the chat interface
- Make sure the input field is empty
- You should see a mic icon instead of the send button
- Click the mic icon to start recording
- Speak clearly for a few seconds
- Click the stop button to end recording
- The transcribed text should appear in the input field

Browser Requirements

Microphone permissions: Users will be prompted to allow microphone access
HTTPS required: Voice input only works on HTTPS (or localhost for development) - critical for iOS Safari
Modern browser: Supports MediaRecorder API and Web Audio API
iOS Support: Compatible with iOS Safari 14.3+ (iPad/iPhone) - must open directly in Safari browser, not PWA/home screen app

Supported Audio Formats

The implementation automatically detects and uses the best supported format:

iOS Safari (preferred formats):

audio/mp4
audio/aac
audio/m4a

Other browsers:

audio/webm;codecs=opus (preferred)
audio/webm
audio/ogg;codecs=opus
Browser default (fallback)

File Size Limits

Maximum audio file size: 25MB (Groq free tier limit)
Recordings are automatically chunked and optimized

Troubleshooting

Common Issues:

"Your browser doesn't support audio recording"
- Update to a modern browser (Chrome, Firefox, Safari, Edge)
- Ensure you're on HTTPS (not HTTP)
"No speech detected"
- Check microphone permissions
- Ensure microphone is working
- Speak closer to the microphone
- Try speaking louder and more clearly
"Transcription service error"
- Verify GROQ_API_KEY is set correctly
- Check Groq API key is valid and has credits
- Check network connectivity
"Unauthorized" error
- User must be logged in to use voice input
- Check authentication status
iOS Safari specific issues
- "Not supported" error: Update to iOS 14.3+ and use Safari directly (not PWA/home screen app)
- Silent recordings after first use: Refresh the page - iOS Safari requires fresh audio streams
- Fails after switching apps: This is a known iOS Safari bug - refresh the page to recover
- Red recording bar: Normal behavior - it clears when recording stops
- Home screen PWA: Launch from Safari directly, not from home screen shortcut

Debug Steps:

Check environment variable:
```
npx convex env list
```
Check browser console for any error messages
Test microphone in other applications
Verify API key in Groq Console

API Usage & Costs

Model: Groq Whisper Large V3 Turbo
Cost: Check Groq Pricing for current rates
Free tier: Includes generous free usage

Implementation Details

Backend Components:

convex/speech_to_text.ts - HTTP action for transcription
convex/http.ts - Route configuration with CORS

Frontend Components:

src/hooks/use-voice-recorder.ts - Recording logic and state management
src/components/voice-recorder.tsx - UI component with waveform visualization
src/components/multimodal-input.tsx - Integration with chat input

The implementation follows security best practices with proper authentication, error handling, and user feedback.

iOS Safari Compatibility

The voice input implementation has been updated based on kaliatech's web-audio-recording-tests to ensure iOS Safari compatibility:

Audio Graph Architecture: Uses createMediaStreamDestination() instead of raw getUserMedia stream for MediaRecorder
Proper Audio Routing: Creates gain nodes and audio analysis before getUserMedia call
Enhanced Cleanup: Comprehensive resource cleanup to prevent iOS Safari stability issues
Stream Management: Always uses fresh audio streams for each recording session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice Input Setup Guide

Overview

Required Configuration

1. Set up Groq API Key

Testing the Feature

Browser Requirements

Supported Audio Formats

File Size Limits

Troubleshooting

Common Issues:

Debug Steps:

API Usage & Costs

Implementation Details

Backend Components:

Frontend Components:

iOS Safari Compatibility

FilesExpand file tree

VOICE_INPUT_SETUP.md

Latest commit

History

VOICE_INPUT_SETUP.md

File metadata and controls

Voice Input Setup Guide

Overview

Required Configuration

1. Set up Groq API Key

Testing the Feature

Browser Requirements

Supported Audio Formats

File Size Limits

Troubleshooting

Common Issues:

Debug Steps:

API Usage & Costs

Implementation Details

Backend Components:

Frontend Components:

iOS Safari Compatibility