-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
起始日期 | Start Date
No response
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
I would like to propose two features for Voice sessions that keep voice input convenience while optimizing responses for text quality instead of TTS.
First, a one-tap “skip / stop reading” button while the AI is speaking, and second, a “user voice, AI text-only” mode where the user speaks but the AI responds only in text with no automatic voice output.
基本示例 | Basic Example
Skip TTS while it’s speaking
- When the AI is reading out its answer, a “Skip / Stop reading” button stops TTS immediately so the user can move on to the next action or prompt without waiting for the audio to finish.
- In many real use cases, users can understand the response just by glancing at the text, so being forced to listen to the entire TTS slows down the interaction unnecessarily.
“Voice input only” mode (user voice, AI text) - Provide a mode where the user’s input is via microphone only, but the AI’s response is always shown as text only, with no automatic audio output.
- This allows Perplexity/Qwen to always use the best text model for responses (long, structured answers, tables, code, etc.) without being constrained by a voice model, maximizing answer quality.
- It is also very useful in environments where the user can quietly speak into the mic but does not want any sound to come out of the device (libraries, shared offices, late-night use, etc.).
- This “user voice, AI text-only” pattern keeps hands-free input, reduces cognitive load from listening to long TTS, and focuses the system on delivering high-quality text responses.
缺陷 | Drawbacks
Increased UI and setting complexity
- Adding a new “skip / stop reading” button and a “voice input only” mode toggle may make the Voice settings and in-session UI more complex.
Diverging expectations and support cost - Some users may implicitly expect “Voice = always TTS”, so clear explanations and help text will be needed to differentiate behavior between normal Voice mode and “text-only response” mode.
- Different behaviors per mode can increase the number of patterns to consider when handling bug reports and user support.
未解决问题 | Unresolved questions
Concrete UI design and placement
- Where and how to place the “skip / stop reading” button (position, icon, size) across mobile and web UIs.
Mode switching flow - How users should enable “user voice, AI text-only” mode: global setting, per-session toggle, or part of an initial Voice onboarding flow.
Model selection behavior - Which model should be the default in text-only response mode, and whether users should be allowed to manually select models when using this mode.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested