💡 [REQUEST] - <title>Add Built-in Text-to-Speech (TTS) and Screenshot Input Support to Qwen Windows Client

### 起始日期 | Start Date

2025/01/25

### 实现PR | Implementation PR

_No response_

### 相关Issues | Reference Issues

_No response_

### 摘要 | Summary

Title: [Feature Request] Add Built-in Text-to-Speech (TTS) and Screenshot Input Support to Qwen Windows Client

Dear Qwen Team,

Thank you for building such a powerful and user-friendly desktop client! As an active Windows user who relies on Qwen for English learning and STEM studies, I’d like to propose two highly valuable features that would significantly enhance the desktop experience:

### 1. **Built-in Text-to-Speech (TTS) / Read-Aloud Function**
- **Why it matters**: Many users (including students, language learners, and visually impaired individuals) benefit greatly from hearing responses spoken aloud—especially for pronunciation practice, proofreading, or multitasking.
- **Current workaround**: Users must copy text and use Windows’ “Read Aloud” (Win + Ctrl + Enter) or Edge browser, which breaks workflow.
- **Suggestion**:  
  Add a speaker icon 🔊 next to each AI response. Clicking it should read the text using high-quality neural TTS (e.g., Microsoft Azure Neural TTS or Alibaba’s own speech synthesis). Support both Chinese and English voices.

### 2. **Screenshot Input with OCR & Vision Understanding**
- **Why it matters**: When studying from PDFs, textbooks, or problem sets, users often encounter content as images or scanned documents. Being able to **take a screenshot and ask Qwen about it** would be transformative.
- **Current limitation**: The Windows client lacks any image input capability, forcing users to switch to mobile or web versions.
- **Suggestion**:  
  Add a “📸” button in the input bar that:
  - Captures screen region (like Snip & Sketch);
  - Extracts text via OCR (for equations, paragraphs, code);
  - Optionally sends the image to Qwen-VL for multimodal understanding (e.g., “Explain this physics diagram”).

### Why These Features Belong on Desktop
The Windows client is where users spend hours studying, coding, and writing. Adding TTS and screenshot input would:
- Close the gap with mobile apps;
- Enable true multimodal interaction on PC;
- Make Qwen a complete AI study companion—not just a chatbox.

These features are technically feasible using existing Windows APIs (e.g., Windows.Media.SpeechSynthesis, WinRT OCR) or Alibaba Cloud services, and would set Qwen apart from other desktop AI tools.

Thank you for considering this request. I believe these additions would greatly empower students, educators, and professionals alike!

Best regards,  
A Dedicated Qwen Windows User

### 基本示例 | Basic Example

.

### 缺陷 | Drawbacks

.

### 未解决问题 | Unresolved questions

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡 [REQUEST] - <title>Add Built-in Text-to-Speech (TTS) and Screenshot Input Support to Qwen Windows Client #2059

起始日期 | Start Date

实现PR | Implementation PR

相关Issues | Reference Issues

摘要 | Summary

1. Built-in Text-to-Speech (TTS) / Read-Aloud Function

2. Screenshot Input with OCR & Vision Understanding

Why These Features Belong on Desktop

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

💡 [REQUEST] - <title>Add Built-in Text-to-Speech (TTS) and Screenshot Input Support to Qwen Windows Client #2059

Description

起始日期 | Start Date

实现PR | Implementation PR

相关Issues | Reference Issues

摘要 | Summary

1. Built-in Text-to-Speech (TTS) / Read-Aloud Function

2. Screenshot Input with OCR & Vision Understanding

Why These Features Belong on Desktop

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions