-
Notifications
You must be signed in to change notification settings - Fork 93
Voice via mobile browser: ASR + TTS on iOS Safari + Android Chrome #896
Copy link
Copy link
Open
Labels
audioAudio (ASR/TTS) changesAudio (ASR/TTS) changesconsumerBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowdomain:surfacesAgent UI, Telegram, WhatsApp, Slack/Discord, mobileAgent UI, Telegram, WhatsApp, Slack/Discord, mobileenhancementNew feature or requestNew feature or requestp1medium prioritymedium priorityspec-readyIssue has implementation spec adequate for coding-agent assignmentIssue has implementation spec adequate for coding-agent assignmenttrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skillsHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills
Metadata
Metadata
Assignees
Labels
audioAudio (ASR/TTS) changesAudio (ASR/TTS) changesconsumerBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowdomain:surfacesAgent UI, Telegram, WhatsApp, Slack/Discord, mobileAgent UI, Telegram, WhatsApp, Slack/Discord, mobileenhancementNew feature or requestNew feature or requestp1medium prioritymedium priorityspec-readyIssue has implementation spec adequate for coding-agent assignmentIssue has implementation spec adequate for coding-agent assignmenttrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skillsHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills
Goal
Make voice input + output work reliably in mobile Safari + Chrome browsers — the path users hit when they reach GAIA via the tunnel from their phone. Without this, the mobile-via-tunnel story is text-only.
Why this matters for consumer adoption
The morning-brief / voice-research / daily-companion use cases all assume voice. Telegram covers async voice (#889 + voice notes). The tunnel covers sync mobile access. Voice via the tunnel is the connector — "I'm holding my phone, I open GAIA, I tap mic, I speak" — and mobile browser voice has known platform quirks.
Scope (single PR, v0.18.2 or v0.19.0)
A. Audit current state
B. Known mobile-browser pitfalls to address
audio/mp4; codecs=mp4a.40.2; Android Chrome prefersaudio/webm;codecs=opus. Need codec detection + fallback.C. Mobile-specific UX improvements
D. Backend coordination
<audio>element or Web Audio API)E. Tests
playeventdocs/guides/voice-on-mobile.mdxwith troubleshooting for permission issuesWhat this is NOT
Acceptance criteria
Attribution / prior art
Dependencies
src/gaia/audio/whisper_asr.py) and Kokoro TTS (src/gaia/audio/kokoro_tts.py)