Skip to content

feat: add basic audio support with voice recording and TTS#20

Open
PaulLampe wants to merge 12 commits intodevelopfrom
feat/basic-audio-support
Open

feat: add basic audio support with voice recording and TTS#20
PaulLampe wants to merge 12 commits intodevelopfrom
feat/basic-audio-support

Conversation

@PaulLampe
Copy link
Copy Markdown
Contributor

Summary

  • Add voice recording to send audio messages that get transcribed
  • Add text-to-speech (TTS) playback for assistant messages
  • Integrate audio controls into chat input and message components

Additional changes

  • Applied formatting to all existing files
  • Added pre-commit hook via Husky for formatting and linting
  • Minor import organization fixes

@vercel
Copy link
Copy Markdown

vercel bot commented Dec 29, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
web Ready Ready Preview, Comment Dec 29, 2025 2:54pm

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive audio support to the chat application, enabling users to send voice messages that get automatically transcribed and receive text-to-speech playback of assistant responses. The implementation includes client-side voice recording, WebSocket-based audio streaming, Firebase persistence, and integrated UI controls.

Key changes:

  • Voice recording with microphone input and automatic transcription via WebSocket
  • Text-to-speech (TTS) playback system for assistant messages with play/pause controls
  • Enhanced message ID generation to maintain consistency between client state and Firebase

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 23 comments.

Show a summary per file
File Description
lib/stores/chat-store.types.ts Added types for TTS state, voice transcription status, and helper function for TTS key generation
lib/stores/chat-store.ts Integrated new audio-related actions into the chat store
lib/stores/actions/voice-transcription-actions.ts Implements state management for voice transcription lifecycle (pending, transcribed, error)
lib/stores/actions/tts-actions.ts Manages TTS state transitions and WebSocket requests for audio generation
lib/stores/actions/send-voice-message.ts Handles sending voice messages with Firebase persistence and WebSocket communication
lib/stores/actions/complete-streaming-message.ts Updated to accept optional message ID from backend for message tracking
lib/stores/actions/chat-add-user-message.ts Enhanced to use consistent message IDs between client and Firebase
lib/socket.types.ts Added payload types for TTS requests/responses and voice transcription
lib/hooks/use-voice-recorder.ts Custom hook for MediaRecorder API integration with permission handling
lib/hooks/use-tts-audio.ts Custom hook managing TTS audio playback lifecycle and state
lib/firebase/firebase.ts Added functions for voice transcription updates and message ID generation
lib/chat-socket.ts Extended socket events for TTS requests and voice transcription
components/sticky-input.tsx Integrated voice recording button with conditional rendering logic
components/providers/socket-provider.tsx Added handlers for TTS and voice transcription WebSocket events
components/home/home-input.tsx Implements voice message capture with sessionStorage handoff to chat page
components/dynamic-rate-limit-sticky-input.tsx Propagates voice message callback through component hierarchy
components/chat/voice-record-button.tsx UI components for voice recording button and recording indicator
components/chat/chat-view.tsx Added voice message flag support for SSR
components/chat/chat-view-ssr.tsx Passes voice message flag to client components
components/chat/chat-tts-button.tsx TTS control button component with loading and playing states
components/chat/chat-single-user-message.tsx Enhanced to display voice transcription status with pending/error/success states
components/chat/chat-single-message.tsx Propagates voice transcription status to child components
components/chat/chat-single-message-actions.tsx Integrates TTS button into message action bar
components/chat/chat-messages-view.tsx Processes pending voice messages from sessionStorage on page load
components/chat/chat-grouped-messages.tsx Passes voice transcription status to message renderer
components/chat/chat-input.tsx Adds voice recording capability to main chat input
app/session/page.tsx Handles voice message URL parameter and metadata generation fix
app/(with-header)/share/page.tsx Fixed metadata generation to handle null/undefined cases
.eslintrc.json Applied formatting to ESLint configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return;
}

const voiceMessagePlaceholder = '[Sprachnachricht]';
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placeholder message '[Sprachnachricht]' uses hardcoded German text with square brackets. This creates inconsistency with the voice transcription status UI which properly indicates pending status. Consider using a more consistent approach or internationalized text.

Suggested change
const voiceMessagePlaceholder = '[Sprachnachricht]';
const voiceMessagePlaceholder = '';

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +102
useEffect(() => {
if (
!hasPendingVoiceMessage ||
hasProcessedVoiceMessage.current ||
!isSocketConnected
)
return;

const pendingAudioBase64 = sessionStorage.getItem(
PENDING_VOICE_MESSAGE_KEY,
);
if (pendingAudioBase64) {
sessionStorage.removeItem(PENDING_VOICE_MESSAGE_KEY);
hasProcessedVoiceMessage.current = true;
// Convert base64 back to Uint8Array
const binaryString = atob(pendingAudioBase64);
const audioBytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
audioBytes[i] = binaryString.charCodeAt(i);
}
sendVoiceMessage(audioBytes);
}
}, [hasPendingVoiceMessage, sendVoiceMessage, isSocketConnected]);
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition when processing the pending voice message. The useEffect checks hasProcessedVoiceMessage.current but another component or re-render could also process the same message if isSocketConnected changes multiple times. Consider using a more robust state management approach, such as moving this flag into the chat store state rather than a component-level ref.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +77
} catch (err) {
if (err instanceof DOMException && err.name === 'NotAllowedError') {
setPermissionStatus('denied');
setError('Mikrofonzugriff wurde verweigert.');
} else {
setError('Fehler beim Starten der Aufnahme.');
}
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MediaStream tracks are not stopped when an error occurs during recording setup. If getUserMedia succeeds but MediaRecorder creation fails, the camera/microphone will remain active. Consider adding stream.getTracks().forEach(track => track.stop()) in the error handler to properly release media resources.

Copilot uses AI. Check for mistakes.
} else if (ttsState.status === 'ready' && ttsState.audioBase64) {
const audio = new Audio(`data:audio/mp3;base64,${ttsState.audioBase64}`);
audioRef.current = audio;
audio.play().catch(console.error);
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audio play error is silently caught with console.error (line 47). Users won't know if audio playback fails. Consider providing user feedback when audio fails to play, such as displaying a toast notification or error message.

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +48
} else if (ttsState.status === 'ready' && ttsState.audioBase64) {
const audio = new Audio(`data:audio/mp3;base64,${ttsState.audioBase64}`);
audioRef.current = audio;
audio.play().catch(console.error);
setTtsPlaying(partyId, messageId);
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple Audio instances can be created for the same message. Each time the play function is called with status 'ready', a new Audio object is created (line 45) without cleaning up the previous one. This can lead to:

  1. Memory leaks from unreleased Audio objects
  2. Multiple audio tracks playing simultaneously if play is called multiple times quickly

Consider checking if audioRef.current already exists before creating a new Audio instance, or cleaning up the previous instance first.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +100
const binaryString = atob(pendingAudioBase64);
const audioBytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
audioBytes[i] = binaryString.charCodeAt(i);
}
sendVoiceMessage(audioBytes);
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base64 encoding/decoding of audio data could fail with large files or non-standard characters, but there's no error handling. If atob() fails (line 95) due to invalid base64, it will throw an uncaught exception. Consider wrapping the conversion in a try-catch block and showing an appropriate error message to the user.

Suggested change
const binaryString = atob(pendingAudioBase64);
const audioBytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
audioBytes[i] = binaryString.charCodeAt(i);
}
sendVoiceMessage(audioBytes);
try {
const binaryString = atob(pendingAudioBase64);
const audioBytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
audioBytes[i] = binaryString.charCodeAt(i);
}
sendVoiceMessage(audioBytes);
} catch (error) {
// Handle invalid or corrupted base64 audio data gracefully
console.error('Failed to decode pending voice message audio data:', error);
if (typeof window !== 'undefined') {
window.alert?.('Unable to process the pending voice message. Please try recording again.');
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +50
const base64 = btoa(String.fromCharCode(...audioBytes));
sessionStorage.setItem(PENDING_VOICE_MESSAGE_KEY, base64);
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base64 encoding of audio bytes using btoa with String.fromCharCode may not handle binary data correctly for all byte values, especially values greater than 255. This could corrupt the audio data. Consider using a more robust base64 encoding method such as Buffer.from(audioBytes).toString('base64') or a dedicated base64 encoding library.

Copilot uses AI. Check for mistakes.
messages: [
{
id: generateUuid(),
id: options?.messageId ?? generateMessageId(sessionId),
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is calling generateMessageId for the inner messageId (line 258) even though this is meant to generate a grouped message ID, not an inner message ID. The function comment on line 233-234 states it generates a Firebase document ID for a message in a chat session, but it's being used for both grouped messages and individual messages within those groups. This could lead to confusion or incorrect ID generation. Consider creating separate functions like generateGroupedMessageId and generateInnerMessageId for clarity, or update the documentation to clarify it works for both levels.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +55
const mediaRecorder = new MediaRecorder(stream, {
mimeType: 'audio/webm;codecs=opus',
audioBitsPerSecond: 32000,
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded MIME type 'audio/webm;codecs=opus' may not be supported on all browsers, particularly Safari which has limited WebM support. Consider adding fallback MIME type detection or using a more universally supported format. You could check MediaRecorder.isTypeSupported() to verify browser compatibility before using this format.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +31
toast.error('wahl.chat ist nicht verbunden.');
return;
}

if (!userId) {
toast.error('Benutzer nicht authentifiziert.');
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded error messages are in German. These should be internationalized or use a consistent language approach with the rest of the codebase. For consistency, consider using a key-based approach or ensure all user-facing messages follow the same internationalization pattern.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants