The voice input functionality has been fully implemented with the following features:
- 🎤 Mic button appears when input is empty (replaces send button)
- 🌊 Real-time waveform visualization during recording
- ⏱️ Recording timer display
- 🛑 Stop button positioned in bottom right
- 🔄 Transcribing state with loading indicator
- 📝 Automatic text insertion into prompt box
- 🔐 Secure backend API with user authentication
The voice input uses Groq's Whisper API for speech-to-text transcription. You need to:
-
Get a Groq API Key:
- Visit Groq Console
- Create an account or sign in
- Generate an API key
-
Configure the API Key in Convex:
For Development:
npx convex env set GROQ_API_KEY your-actual-groq-api-key-hereFor Production:
npx convex env set GROQ_API_KEY your-actual-groq-api-key-here --prodOr via Convex Dashboard:
- Go to your Convex Dashboard
- Navigate to your project's Deployment Settings
- Add environment variable:
GROQ_API_KEYwith your API key value
-
Start your development server (if not already running):
bun run dev
-
Test the voice input:
- Open the chat interface
- Make sure the input field is empty
- You should see a mic icon instead of the send button
- Click the mic icon to start recording
- Speak clearly for a few seconds
- Click the stop button to end recording
- The transcribed text should appear in the input field
- Microphone permissions: Users will be prompted to allow microphone access
- HTTPS required: Voice input only works on HTTPS (or localhost for development) - critical for iOS Safari
- Modern browser: Supports MediaRecorder API and Web Audio API
- iOS Support: Compatible with iOS Safari 14.3+ (iPad/iPhone) - must open directly in Safari browser, not PWA/home screen app
The implementation automatically detects and uses the best supported format:
iOS Safari (preferred formats):
audio/mp4audio/aacaudio/m4a
Other browsers:
audio/webm;codecs=opus(preferred)audio/webmaudio/ogg;codecs=opus- Browser default (fallback)
- Maximum audio file size: 25MB (Groq free tier limit)
- Recordings are automatically chunked and optimized
-
"Your browser doesn't support audio recording"
- Update to a modern browser (Chrome, Firefox, Safari, Edge)
- Ensure you're on HTTPS (not HTTP)
-
"No speech detected"
- Check microphone permissions
- Ensure microphone is working
- Speak closer to the microphone
- Try speaking louder and more clearly
-
"Transcription service error"
- Verify GROQ_API_KEY is set correctly
- Check Groq API key is valid and has credits
- Check network connectivity
-
"Unauthorized" error
- User must be logged in to use voice input
- Check authentication status
-
iOS Safari specific issues
- "Not supported" error: Update to iOS 14.3+ and use Safari directly (not PWA/home screen app)
- Silent recordings after first use: Refresh the page - iOS Safari requires fresh audio streams
- Fails after switching apps: This is a known iOS Safari bug - refresh the page to recover
- Red recording bar: Normal behavior - it clears when recording stops
- Home screen PWA: Launch from Safari directly, not from home screen shortcut
-
Check environment variable:
npx convex env list
-
Check browser console for any error messages
-
Test microphone in other applications
-
Verify API key in Groq Console
- Model: Groq Whisper Large V3 Turbo
- Cost: Check Groq Pricing for current rates
- Free tier: Includes generous free usage
convex/speech_to_text.ts- HTTP action for transcriptionconvex/http.ts- Route configuration with CORS
src/hooks/use-voice-recorder.ts- Recording logic and state managementsrc/components/voice-recorder.tsx- UI component with waveform visualizationsrc/components/multimodal-input.tsx- Integration with chat input
The implementation follows security best practices with proper authentication, error handling, and user feedback.
The voice input implementation has been updated based on kaliatech's web-audio-recording-tests to ensure iOS Safari compatibility:
- Audio Graph Architecture: Uses
createMediaStreamDestination()instead of raw getUserMedia stream for MediaRecorder - Proper Audio Routing: Creates gain nodes and audio analysis before getUserMedia call
- Enhanced Cleanup: Comprehensive resource cleanup to prevent iOS Safari stability issues
- Stream Management: Always uses fresh audio streams for each recording session