feat: add basic audio support with voice recording and TTS#20
feat: add basic audio support with voice recording and TTS#20
Conversation
add: Contributions welcome to readme
fix: forgotten description in root layout metadata
- Add voice recording functionality - Add text-to-speech (TTS) playback for messages
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive audio support to the chat application, enabling users to send voice messages that get automatically transcribed and receive text-to-speech playback of assistant responses. The implementation includes client-side voice recording, WebSocket-based audio streaming, Firebase persistence, and integrated UI controls.
Key changes:
- Voice recording with microphone input and automatic transcription via WebSocket
- Text-to-speech (TTS) playback system for assistant messages with play/pause controls
- Enhanced message ID generation to maintain consistency between client state and Firebase
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 23 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/stores/chat-store.types.ts | Added types for TTS state, voice transcription status, and helper function for TTS key generation |
| lib/stores/chat-store.ts | Integrated new audio-related actions into the chat store |
| lib/stores/actions/voice-transcription-actions.ts | Implements state management for voice transcription lifecycle (pending, transcribed, error) |
| lib/stores/actions/tts-actions.ts | Manages TTS state transitions and WebSocket requests for audio generation |
| lib/stores/actions/send-voice-message.ts | Handles sending voice messages with Firebase persistence and WebSocket communication |
| lib/stores/actions/complete-streaming-message.ts | Updated to accept optional message ID from backend for message tracking |
| lib/stores/actions/chat-add-user-message.ts | Enhanced to use consistent message IDs between client and Firebase |
| lib/socket.types.ts | Added payload types for TTS requests/responses and voice transcription |
| lib/hooks/use-voice-recorder.ts | Custom hook for MediaRecorder API integration with permission handling |
| lib/hooks/use-tts-audio.ts | Custom hook managing TTS audio playback lifecycle and state |
| lib/firebase/firebase.ts | Added functions for voice transcription updates and message ID generation |
| lib/chat-socket.ts | Extended socket events for TTS requests and voice transcription |
| components/sticky-input.tsx | Integrated voice recording button with conditional rendering logic |
| components/providers/socket-provider.tsx | Added handlers for TTS and voice transcription WebSocket events |
| components/home/home-input.tsx | Implements voice message capture with sessionStorage handoff to chat page |
| components/dynamic-rate-limit-sticky-input.tsx | Propagates voice message callback through component hierarchy |
| components/chat/voice-record-button.tsx | UI components for voice recording button and recording indicator |
| components/chat/chat-view.tsx | Added voice message flag support for SSR |
| components/chat/chat-view-ssr.tsx | Passes voice message flag to client components |
| components/chat/chat-tts-button.tsx | TTS control button component with loading and playing states |
| components/chat/chat-single-user-message.tsx | Enhanced to display voice transcription status with pending/error/success states |
| components/chat/chat-single-message.tsx | Propagates voice transcription status to child components |
| components/chat/chat-single-message-actions.tsx | Integrates TTS button into message action bar |
| components/chat/chat-messages-view.tsx | Processes pending voice messages from sessionStorage on page load |
| components/chat/chat-grouped-messages.tsx | Passes voice transcription status to message renderer |
| components/chat/chat-input.tsx | Adds voice recording capability to main chat input |
| app/session/page.tsx | Handles voice message URL parameter and metadata generation fix |
| app/(with-header)/share/page.tsx | Fixed metadata generation to handle null/undefined cases |
| .eslintrc.json | Applied formatting to ESLint configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return; | ||
| } | ||
|
|
||
| const voiceMessagePlaceholder = '[Sprachnachricht]'; |
There was a problem hiding this comment.
The placeholder message '[Sprachnachricht]' uses hardcoded German text with square brackets. This creates inconsistency with the voice transcription status UI which properly indicates pending status. Consider using a more consistent approach or internationalized text.
| const voiceMessagePlaceholder = '[Sprachnachricht]'; | |
| const voiceMessagePlaceholder = ''; |
| useEffect(() => { | ||
| if ( | ||
| !hasPendingVoiceMessage || | ||
| hasProcessedVoiceMessage.current || | ||
| !isSocketConnected | ||
| ) | ||
| return; | ||
|
|
||
| const pendingAudioBase64 = sessionStorage.getItem( | ||
| PENDING_VOICE_MESSAGE_KEY, | ||
| ); | ||
| if (pendingAudioBase64) { | ||
| sessionStorage.removeItem(PENDING_VOICE_MESSAGE_KEY); | ||
| hasProcessedVoiceMessage.current = true; | ||
| // Convert base64 back to Uint8Array | ||
| const binaryString = atob(pendingAudioBase64); | ||
| const audioBytes = new Uint8Array(binaryString.length); | ||
| for (let i = 0; i < binaryString.length; i++) { | ||
| audioBytes[i] = binaryString.charCodeAt(i); | ||
| } | ||
| sendVoiceMessage(audioBytes); | ||
| } | ||
| }, [hasPendingVoiceMessage, sendVoiceMessage, isSocketConnected]); |
There was a problem hiding this comment.
Potential race condition when processing the pending voice message. The useEffect checks hasProcessedVoiceMessage.current but another component or re-render could also process the same message if isSocketConnected changes multiple times. Consider using a more robust state management approach, such as moving this flag into the chat store state rather than a component-level ref.
| } catch (err) { | ||
| if (err instanceof DOMException && err.name === 'NotAllowedError') { | ||
| setPermissionStatus('denied'); | ||
| setError('Mikrofonzugriff wurde verweigert.'); | ||
| } else { | ||
| setError('Fehler beim Starten der Aufnahme.'); | ||
| } | ||
| } |
There was a problem hiding this comment.
The MediaStream tracks are not stopped when an error occurs during recording setup. If getUserMedia succeeds but MediaRecorder creation fails, the camera/microphone will remain active. Consider adding stream.getTracks().forEach(track => track.stop()) in the error handler to properly release media resources.
| } else if (ttsState.status === 'ready' && ttsState.audioBase64) { | ||
| const audio = new Audio(`data:audio/mp3;base64,${ttsState.audioBase64}`); | ||
| audioRef.current = audio; | ||
| audio.play().catch(console.error); |
There was a problem hiding this comment.
The audio play error is silently caught with console.error (line 47). Users won't know if audio playback fails. Consider providing user feedback when audio fails to play, such as displaying a toast notification or error message.
| } else if (ttsState.status === 'ready' && ttsState.audioBase64) { | ||
| const audio = new Audio(`data:audio/mp3;base64,${ttsState.audioBase64}`); | ||
| audioRef.current = audio; | ||
| audio.play().catch(console.error); | ||
| setTtsPlaying(partyId, messageId); |
There was a problem hiding this comment.
Multiple Audio instances can be created for the same message. Each time the play function is called with status 'ready', a new Audio object is created (line 45) without cleaning up the previous one. This can lead to:
- Memory leaks from unreleased Audio objects
- Multiple audio tracks playing simultaneously if play is called multiple times quickly
Consider checking if audioRef.current already exists before creating a new Audio instance, or cleaning up the previous instance first.
| const binaryString = atob(pendingAudioBase64); | ||
| const audioBytes = new Uint8Array(binaryString.length); | ||
| for (let i = 0; i < binaryString.length; i++) { | ||
| audioBytes[i] = binaryString.charCodeAt(i); | ||
| } | ||
| sendVoiceMessage(audioBytes); |
There was a problem hiding this comment.
The base64 encoding/decoding of audio data could fail with large files or non-standard characters, but there's no error handling. If atob() fails (line 95) due to invalid base64, it will throw an uncaught exception. Consider wrapping the conversion in a try-catch block and showing an appropriate error message to the user.
| const binaryString = atob(pendingAudioBase64); | |
| const audioBytes = new Uint8Array(binaryString.length); | |
| for (let i = 0; i < binaryString.length; i++) { | |
| audioBytes[i] = binaryString.charCodeAt(i); | |
| } | |
| sendVoiceMessage(audioBytes); | |
| try { | |
| const binaryString = atob(pendingAudioBase64); | |
| const audioBytes = new Uint8Array(binaryString.length); | |
| for (let i = 0; i < binaryString.length; i++) { | |
| audioBytes[i] = binaryString.charCodeAt(i); | |
| } | |
| sendVoiceMessage(audioBytes); | |
| } catch (error) { | |
| // Handle invalid or corrupted base64 audio data gracefully | |
| console.error('Failed to decode pending voice message audio data:', error); | |
| if (typeof window !== 'undefined') { | |
| window.alert?.('Unable to process the pending voice message. Please try recording again.'); | |
| } | |
| } |
| const base64 = btoa(String.fromCharCode(...audioBytes)); | ||
| sessionStorage.setItem(PENDING_VOICE_MESSAGE_KEY, base64); |
There was a problem hiding this comment.
The base64 encoding of audio bytes using btoa with String.fromCharCode may not handle binary data correctly for all byte values, especially values greater than 255. This could corrupt the audio data. Consider using a more robust base64 encoding method such as Buffer.from(audioBytes).toString('base64') or a dedicated base64 encoding library.
| messages: [ | ||
| { | ||
| id: generateUuid(), | ||
| id: options?.messageId ?? generateMessageId(sessionId), |
There was a problem hiding this comment.
The function is calling generateMessageId for the inner messageId (line 258) even though this is meant to generate a grouped message ID, not an inner message ID. The function comment on line 233-234 states it generates a Firebase document ID for a message in a chat session, but it's being used for both grouped messages and individual messages within those groups. This could lead to confusion or incorrect ID generation. Consider creating separate functions like generateGroupedMessageId and generateInnerMessageId for clarity, or update the documentation to clarify it works for both levels.
| const mediaRecorder = new MediaRecorder(stream, { | ||
| mimeType: 'audio/webm;codecs=opus', | ||
| audioBitsPerSecond: 32000, |
There was a problem hiding this comment.
Hardcoded MIME type 'audio/webm;codecs=opus' may not be supported on all browsers, particularly Safari which has limited WebM support. Consider adding fallback MIME type detection or using a more universally supported format. You could check MediaRecorder.isTypeSupported() to verify browser compatibility before using this format.
| toast.error('wahl.chat ist nicht verbunden.'); | ||
| return; | ||
| } | ||
|
|
||
| if (!userId) { | ||
| toast.error('Benutzer nicht authentifiziert.'); |
There was a problem hiding this comment.
The hardcoded error messages are in German. These should be internationalized or use a consistent language approach with the rest of the codebase. For consistency, consider using a key-based approach or ensure all user-facing messages follow the same internationalization pattern.
Summary
Additional changes