Skip to content

💡 [REQUEST] - “user voice, AI text-only” mode #2054

@natu123

Description

@natu123

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

I would like to propose two features for Voice sessions that keep voice input convenience while optimizing responses for text quality instead of TTS.

First, a one-tap “skip / stop reading” button while the AI is speaking, and second, a “user voice, AI text-only” mode where the user speaks but the AI responds only in text with no automatic voice output.

基本示例 | Basic Example

Skip TTS while it’s speaking

  • When the AI is reading out its answer, a “Skip / Stop reading” button stops TTS immediately so the user can move on to the next action or prompt without waiting for the audio to finish.
    ​- In many real use cases, users can understand the response just by glancing at the text, so being forced to listen to the entire TTS slows down the interaction unnecessarily.

    “Voice input only” mode (user voice, AI text)
  • Provide a mode where the user’s input is via microphone only, but the AI’s response is always shown as text only, with no automatic audio output.
  • This allows Perplexity/Qwen to always use the best text model for responses (long, structured answers, tables, code, etc.) without being constrained by a voice model, maximizing answer quality.
  • It is also very useful in environments where the user can quietly speak into the mic but does not want any sound to come out of the device (libraries, shared offices, late-night use, etc.).
  • This “user voice, AI text-only” pattern keeps hands-free input, reduces cognitive load from listening to long TTS, and focuses the system on delivering high-quality text responses.

缺陷 | Drawbacks

Increased UI and setting complexity

  • Adding a new “skip / stop reading” button and a “voice input only” mode toggle may make the Voice settings and in-session UI more complex.

    Diverging expectations and support cost
  • Some users may implicitly expect “Voice = always TTS”, so clear explanations and help text will be needed to differentiate behavior between normal Voice mode and “text-only response” mode.
    ​- Different behaviors per mode can increase the number of patterns to consider when handling bug reports and user support.

未解决问题 | Unresolved questions

Concrete UI design and placement

  • Where and how to place the “skip / stop reading” button (position, icon, size) across mobile and web UIs.

    Mode switching flow
  • How users should enable “user voice, AI text-only” mode: global setting, per-session toggle, or part of an initial Voice onboarding flow.

    Model selection behavior
  • Which model should be the default in text-only response mode, and whether users should be allowed to manually select models when using this mode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions