Skip to content

feat: add speech-to-text transcription with OpenAI Whisper#2

Merged
CarlosZiegler merged 2 commits into
mainfrom
claude/tender-joliot
Feb 21, 2026
Merged

feat: add speech-to-text transcription with OpenAI Whisper#2
CarlosZiegler merged 2 commits into
mainfrom
claude/tender-joliot

Conversation

@CarlosZiegler

@CarlosZiegler CarlosZiegler commented Feb 21, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add voice input (speech-to-text) to AI Chat with dual-mode cross-browser support: Web Speech API for Chrome (real-time, no backend) and MediaRecorder + OpenAI Whisper for Firefox/Safari
  • New POST /api/transcribe endpoint using AI SDK's experimental_transcribe() with whisper-1 model — accepts audio FormData, returns transcribed text
  • Wire onAudioRecorded callback in chat page to connect the SpeechInput component to the backend transcription service
  • Add codebase mapping docs (.planning/codebase/), dev server configs (.claude/launch.json), and README documentation for speech input feature
  • Remove broken device-utils.test.ts that imported non-existent exports

Test plan

  • All 98 tests pass (2 pre-existing infra failures: empty drizzle config test + Docker-dependent RLS test)
  • Dev server compiles cleanly with zero errors
  • Chat page renders mic button correctly (verified via screenshot)
  • Manual test: click mic button in Chrome → Web Speech API transcribes in real-time
  • Manual test: click mic button in Firefox/Safari → records audio → sends to /api/transcribe → text appears in input

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features
    • Added Speech Input (voice-to-text) feature to AI Chat with cross-browser support for Chrome, Firefox, and Safari.

CarlosZiegler and others added 2 commits February 21, 2026 12:33
Add voice input support to AI Chat using dual-mode approach:
- Chrome: Web Speech API (real-time, no backend)
- Firefox/Safari: MediaRecorder + OpenAI Whisper via /api/transcribe endpoint

Also adds dev server configs (.claude/launch.json), removes broken
device-utils test, and documents speech input in README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Feb 21, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR introduces a speech-to-text feature for AI Chat with dual-browser support, adds comprehensive codebase planning documentation covering architecture, conventions, concerns, integrations, stack and structure, establishes development launch configurations for Bun, and removes a test file.

Changes

Cohort / File(s) Summary
Speech Input Feature
apps/start-template/src/routes/api/transcribe/index.ts, apps/start-template/src/routes/(dashboard)/chat/index.tsx, apps/start-template/README.md
Adds POST /api/transcribe endpoint for audio transcription via OpenAI Whisper API, integrates SpeechInput component into chat with handleAudioRecorded callback, and documents voice input feature. Supports Web Speech API (Chrome) and MediaRecorder + backend transcription (Firefox/Safari).
Route Registration
apps/start-template/src/routeTree.gen.ts
Registers new /api/transcribe/ API route in TanStack Router with updated FileRoutesByFullPath, FileRoutesByTo, FileRoutesById, and RootRouteChildren type mappings.
Codebase Planning Documentation
.planning/codebase/ARCHITECTURE.md, .planning/codebase/CONCERNS.md, .planning/codebase/CONVENTIONS.md, .planning/codebase/INTEGRATIONS.md, .planning/codebase/STACK.md, .planning/codebase/STRUCTURE.md, .planning/codebase/TESTING.md
Comprehensive planning documents detailing full-stack architecture patterns, layered structure, cross-cutting concerns, coding standards and conventions, external integrations and services, technology stack and dependencies, codebase structure and organization, and testing frameworks and practices.
Development Configuration
.claude/launch.json
Adds Bun-based launch configurations for start-template, email-dev, db-studio, and vitest-ui with respective ports and working directories.
Test Cleanup
apps/start-template/src/lib/__tests__/device-utils.test.ts
Deletes entire device-utils test suite covering getDeviceIcon, getDeviceIconFromUserAgent, getDeviceInfo, and getDeviceType functionality.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A voice echoes through the chat so bright,
Speech becomes text in browser flight,
Documentation guides with wisdom deep,
While Bun configurations help systems leap!
Plans laid clear for all to see,
The codebase blooms with clarity! 🌱

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/tender-joliot

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CarlosZiegler CarlosZiegler merged commit 036715f into main Feb 21, 2026
1 of 2 checks passed
@CarlosZiegler CarlosZiegler self-assigned this Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant