-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
起始日期 | Start Date
2025/01/25
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
Title: [Feature Request] Add Built-in Text-to-Speech (TTS) and Screenshot Input Support to Qwen Windows Client
Dear Qwen Team,
Thank you for building such a powerful and user-friendly desktop client! As an active Windows user who relies on Qwen for English learning and STEM studies, I’d like to propose two highly valuable features that would significantly enhance the desktop experience:
1. Built-in Text-to-Speech (TTS) / Read-Aloud Function
- Why it matters: Many users (including students, language learners, and visually impaired individuals) benefit greatly from hearing responses spoken aloud—especially for pronunciation practice, proofreading, or multitasking.
- Current workaround: Users must copy text and use Windows’ “Read Aloud” (Win + Ctrl + Enter) or Edge browser, which breaks workflow.
- Suggestion:
Add a speaker icon 🔊 next to each AI response. Clicking it should read the text using high-quality neural TTS (e.g., Microsoft Azure Neural TTS or Alibaba’s own speech synthesis). Support both Chinese and English voices.
2. Screenshot Input with OCR & Vision Understanding
- Why it matters: When studying from PDFs, textbooks, or problem sets, users often encounter content as images or scanned documents. Being able to take a screenshot and ask Qwen about it would be transformative.
- Current limitation: The Windows client lacks any image input capability, forcing users to switch to mobile or web versions.
- Suggestion:
Add a “📸” button in the input bar that:- Captures screen region (like Snip & Sketch);
- Extracts text via OCR (for equations, paragraphs, code);
- Optionally sends the image to Qwen-VL for multimodal understanding (e.g., “Explain this physics diagram”).
Why These Features Belong on Desktop
The Windows client is where users spend hours studying, coding, and writing. Adding TTS and screenshot input would:
- Close the gap with mobile apps;
- Enable true multimodal interaction on PC;
- Make Qwen a complete AI study companion—not just a chatbox.
These features are technically feasible using existing Windows APIs (e.g., Windows.Media.SpeechSynthesis, WinRT OCR) or Alibaba Cloud services, and would set Qwen apart from other desktop AI tools.
Thank you for considering this request. I believe these additions would greatly empower students, educators, and professionals alike!
Best regards,
A Dedicated Qwen Windows User
基本示例 | Basic Example
.
缺陷 | Drawbacks
.
未解决问题 | Unresolved questions
.