Skip to content

Commit c0db449

Browse files
LyaQanYiclaude
andauthored
voice (#1336)
* feat(voice): 添加 Gemini 和 Grok 提供商支持,动态加载音色配置 * fix(voice): degraded-path fallback 用 DEFAULT_VOICE 兜底,避免空 voice 下发 api_providers.json 加载失败时 GEMINI_PROVIDER / GROK_PROVIDER 会是 None, 原先 normalize_*_tts_voice 在这条退化路径上对空输入直接返回 ('', False), 让 gemini_tts_worker 把空 voiceName 打进上游、grok_streaming_tts_worker 把空 voice 塞进 query string —— 两边都会让请求失败而不是回落到 Leda / eve 这两个稳定的内置默认音色。 改成 None 分支下空输入用模块级 GEMINI_TTS_DEFAULT_VOICE / GROK_TTS_DEFAULT_VOICE 兜底(这两个常量已经覆盖 config 缺失场景), 与 NativeVoiceProvider.normalize() 在 happy path 的行为对齐。 Codex review (PR #1336): - utils/gemini_tts_voices.py:85 - utils/grok_tts_voices.py:88 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice): config 加载失败时保留 in-code fallback catalog,避免 routing 回归 PR #1336 第一版的 _create_provider 在 api_providers.json 加载为 {} 时返 回 None,让 GEMINI_PROVIDER / GROK_PROVIDER 直接从 native_voice_registry 里掉队。这条退化路径下: - resolve_native_voice_for_routing("gemini", "Puck", ...) 返 (Puck, False) —— provider 不在注册表 - core._has_custom_tts() 把非空 voice_id 当 custom 处理 - get_tts_worker() 路由到 cosyvoice_vc_tts_worker,而不是 gemini_tts_worker / grok_streaming_tts_worker 也就是 Puck / Leda / eve / leo 这些内置音色在配置加载失败时会被静默路由 到 CosyVoice,鉴权和合成都会失败 —— 比"丢失目录元数据"更隐蔽的回归, PR #1336 之前 catalog 硬编码在 Python 里时不存在这条路径。 把 PR #1290 之前那份完整 30 + 5 voices 目录作为 _FALLBACK_*_VOICE_GENDERS 保留在 Python 模块里,aliases 一并镜像。优先级链: GEMINI_TTS_VOICE_GENDERS = _CFG.get('voices') or _FALLBACK_GEMINI_TTS_VOICE_GENDERS 这样: - 正常情况:config 是权威源,新增/调整音色只改 JSON - JSON 损坏:fallback 兜底,_create_provider 不再返回 None,registry 保留 Gemini/Grok,routing 行为对齐 PR #1336 之前 - 两份漂移的代价仅仅是"新加的音色在 JSON 缺失时不可见",可控 副带修了 CodeRabbit 指出的 GEMINI/GROK_TTS_DEFAULT_MALE_VOICE 尾段 `or DEFAULT_VOICE` 死代码(FALLBACK 常量是非空字面量,永远走不到)。 provider 既然 always-on,normalize_*_tts_voice 里 GEMINI_PROVIDER is None 分支也是死代码,一并去掉。 Codex review (PR #1336): - utils/gemini_tts_voices.py:61 - utils/grok_tts_voices.py:57 跳过:CodeRabbit 建议在 _create_provider 里加 `default_voice in catalog` 的防御校验 —— fallback 链保证 default_voice 永远落在 catalog 内(in-code fallback 里 Leda/Puck/eve/leo 都在),不需要 runtime guard。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 396de3e commit c0db449

4 files changed

Lines changed: 235 additions & 46 deletions

File tree

config/api_providers.json

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,79 @@
432432
"free": {
433433
"inherits": "step",
434434
"catalog_prefix": "免费 API"
435+
},
436+
"gemini": {
437+
"catalog_prefix": "Gemini",
438+
"default_voice": "Leda",
439+
"default_male_voice": "Puck",
440+
"voices": {
441+
"Achernar": "Female",
442+
"Achird": "Male",
443+
"Algenib": "Male",
444+
"Algieba": "Male",
445+
"Alnilam": "Male",
446+
"Aoede": "Female",
447+
"Autonoe": "Female",
448+
"Callirrhoe": "Female",
449+
"Charon": "Male",
450+
"Despina": "Female",
451+
"Enceladus": "Male",
452+
"Erinome": "Female",
453+
"Fenrir": "Male",
454+
"Gacrux": "Female",
455+
"Iapetus": "Male",
456+
"Kore": "Female",
457+
"Laomedeia": "Female",
458+
"Leda": "Female",
459+
"Orus": "Male",
460+
"Pulcherrima": "Female",
461+
"Puck": "Male",
462+
"Rasalgethi": "Male",
463+
"Sadachbia": "Male",
464+
"Sadaltager": "Male",
465+
"Schedar": "Male",
466+
"Sulafat": "Female",
467+
"Umbriel": "Male",
468+
"Vindemiatrix": "Female",
469+
"Zephyr": "Female",
470+
"Zubenelgenubi": "Male"
471+
},
472+
"aliases": {
473+
"male": "Puck",
474+
"man": "Puck",
475+
"masculine": "Puck",
476+
"男": "Puck",
477+
"男声": "Puck",
478+
"中文男": "Puck",
479+
"female": "Leda",
480+
"woman": "Leda",
481+
"feminine": "Leda",
482+
"女": "Leda",
483+
"女声": "Leda",
484+
"中文女": "Leda"
485+
}
486+
},
487+
"grok": {
488+
"catalog_prefix": "Grok",
489+
"default_voice": "eve",
490+
"default_male_voice": "leo",
491+
"voices": {
492+
"eve": "Female",
493+
"ara": "Female",
494+
"leo": "Male",
495+
"rex": "Male",
496+
"sal": "Male"
497+
},
498+
"aliases": {
499+
"male": "leo",
500+
"man": "leo",
501+
"男": "leo",
502+
"男声": "leo",
503+
"female": "eve",
504+
"woman": "eve",
505+
"女": "eve",
506+
"女声": "eve"
507+
}
435508
}
436509
}
437510
}

docs/api_providers_fields.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,23 @@
4141
部分官方 HTTP TTS 音色可能需要额外权限或不支持免费线路,不应直接暴露到
4242
角色卡和克隆页,否则预览/应用会返回 voice not found。
4343

44+
### Provider 适配说明
45+
46+
- **`step` / `free`**`catalog_value_is_display_name=true``voices` 的值就是
47+
前端展示名(中文音色标签)。`free` 通过 `inherits: "step"` 复用同一份目录,
48+
只覆盖 `catalog_prefix`。详见 `utils/stepfun_tts_voices.py`
49+
- **`gemini`**`catalog_value_is_display_name=false`(默认),`voices` 的值
50+
是性别标签 `Female`/`Male`,前端展示用 voice_id(Leda/Puck/…)。`aliases`
51+
`male`/`woman`/`中文女` 之类用户输入映射回 Puck/Leda 等规范 ID。详见
52+
`utils/gemini_tts_voices.py`
53+
- **`grok`**:与 Gemini 同形,5 个 xAI 内置音色(eve/ara/leo/rex/sal),
54+
上游接收小写 voice_id。详见 `utils/grok_tts_voices.py`
55+
56+
新增 Provider 时只需在 `native_tts_voice_providers` 里加配置,再写一个
57+
~50 行的适配模块(参照三者之一)调用 `register_provider`,把模块名追加到
58+
`utils/native_voice_registry.py::_BUILTIN_PROVIDER_MODULES` 即可——`config_manager`
59+
/`characters_router`/`tts_client` 都不需要改。
60+
4461
---
4562

4663
## 1. 阿里云 (DashScope / Qwen)

utils/gemini_tts_voices.py

Lines changed: 76 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,33 @@
77
Gemini-bound by virtue of speaking Gemini's wire format (the
88
`gemini_tts_worker` HTTP call and the Gemini Live `speech_config` setup).
99
10+
音色 ID、展示性别和默认值优先读取自 config/api_providers.json 的
11+
native_tts_voice_providers.gemini,避免修改音色清单要动 Python 代码。
12+
fallback 常量是 PR #1290 之前的硬编码目录的副本,仅在 JSON 加载失败时兜底
13+
—— 此时 provider 仍必须留在 registry 里,否则
14+
`resolve_native_voice_for_routing("gemini", ...)` 会判 native=False,
15+
`core._has_custom_tts()` 把内置音色当 custom,最终把 Puck/Leda 也路由到
16+
cosyvoice_vc_tts_worker,比"丢失目录元数据"更隐蔽的 routing 回归。
17+
1018
Voice list reference: https://ai.google.dev/gemini-api/docs/speech-generation
1119
"""
1220

21+
from utils.api_config_loader import get_native_tts_voice_provider_config
1322
from utils.native_voice_registry import (
1423
NativeVoiceProvider,
1524
register_provider,
1625
)
1726

1827
GEMINI_TTS_MODEL = "gemini-2.5-flash-preview-tts"
19-
GEMINI_TTS_DEFAULT_VOICE = "Leda"
20-
GEMINI_TTS_DEFAULT_MALE_VOICE = "Puck"
2128

22-
GEMINI_TTS_VOICE_GENDERS: dict[str, str] = {
29+
FALLBACK_GEMINI_TTS_DEFAULT_VOICE = "Leda"
30+
FALLBACK_GEMINI_TTS_DEFAULT_MALE_VOICE = "Puck"
31+
32+
# 与 api_providers.json 的 native_tts_voice_providers.gemini.voices 保持
33+
# 同形;config 是权威源,这份是 JSON 加载失败时的兜底,保证 provider 始终
34+
# 注册成功、routing 不退化到 cosyvoice。两边漂移的代价仅仅是"新版 JSON
35+
# 加的音色在 config 缺失时不可见",比 routing 走错路要轻。
36+
_FALLBACK_GEMINI_TTS_VOICE_GENDERS: dict[str, str] = {
2337
"Achernar": "Female",
2438
"Achird": "Male",
2539
"Algenib": "Male",
@@ -52,30 +66,69 @@
5266
"Zubenelgenubi": "Male",
5367
}
5468

55-
_GEMINI_TTS_VOICE_ALIASES: dict[str, str] = {
56-
"male": GEMINI_TTS_DEFAULT_MALE_VOICE,
57-
"man": GEMINI_TTS_DEFAULT_MALE_VOICE,
58-
"masculine": GEMINI_TTS_DEFAULT_MALE_VOICE,
59-
"男": GEMINI_TTS_DEFAULT_MALE_VOICE,
60-
"男声": GEMINI_TTS_DEFAULT_MALE_VOICE,
61-
"中文男": GEMINI_TTS_DEFAULT_MALE_VOICE,
62-
"female": GEMINI_TTS_DEFAULT_VOICE,
63-
"woman": GEMINI_TTS_DEFAULT_VOICE,
64-
"feminine": GEMINI_TTS_DEFAULT_VOICE,
65-
"女": GEMINI_TTS_DEFAULT_VOICE,
66-
"女声": GEMINI_TTS_DEFAULT_VOICE,
67-
"中文女": GEMINI_TTS_DEFAULT_VOICE,
69+
_FALLBACK_GEMINI_TTS_VOICE_ALIASES: dict[str, str] = {
70+
"male": FALLBACK_GEMINI_TTS_DEFAULT_MALE_VOICE,
71+
"man": FALLBACK_GEMINI_TTS_DEFAULT_MALE_VOICE,
72+
"masculine": FALLBACK_GEMINI_TTS_DEFAULT_MALE_VOICE,
73+
"男": FALLBACK_GEMINI_TTS_DEFAULT_MALE_VOICE,
74+
"男声": FALLBACK_GEMINI_TTS_DEFAULT_MALE_VOICE,
75+
"中文男": FALLBACK_GEMINI_TTS_DEFAULT_MALE_VOICE,
76+
"female": FALLBACK_GEMINI_TTS_DEFAULT_VOICE,
77+
"woman": FALLBACK_GEMINI_TTS_DEFAULT_VOICE,
78+
"feminine": FALLBACK_GEMINI_TTS_DEFAULT_VOICE,
79+
"女": FALLBACK_GEMINI_TTS_DEFAULT_VOICE,
80+
"女声": FALLBACK_GEMINI_TTS_DEFAULT_VOICE,
81+
"中文女": FALLBACK_GEMINI_TTS_DEFAULT_VOICE,
6882
}
6983

70-
GEMINI_PROVIDER = NativeVoiceProvider(
71-
key="gemini",
72-
catalog=GEMINI_TTS_VOICE_GENDERS,
73-
aliases=_GEMINI_TTS_VOICE_ALIASES,
74-
default_voice=GEMINI_TTS_DEFAULT_VOICE,
75-
default_male_voice=GEMINI_TTS_DEFAULT_MALE_VOICE,
76-
catalog_prefix="Gemini",
84+
85+
def _load_provider_config() -> dict:
86+
return get_native_tts_voice_provider_config("gemini")
87+
88+
89+
_CFG = _load_provider_config()
90+
91+
GEMINI_TTS_VOICE_GENDERS: dict[str, str] = (
92+
_CFG.get("voices") or _FALLBACK_GEMINI_TTS_VOICE_GENDERS
93+
)
94+
GEMINI_TTS_DEFAULT_VOICE = (
95+
_CFG.get("default_voice") or FALLBACK_GEMINI_TTS_DEFAULT_VOICE
7796
)
97+
GEMINI_TTS_DEFAULT_MALE_VOICE = (
98+
_CFG.get("default_male_voice") or FALLBACK_GEMINI_TTS_DEFAULT_MALE_VOICE
99+
)
100+
101+
102+
def _build_aliases(configured: dict[str, str]) -> dict[str, str]:
103+
"""Casefold alias keys so NativeVoiceProvider.normalize 的 casefold 查表能命中。
104+
与 stepfun_tts_voices._build_aliases 的差别:Gemini 的 catalog value 是性别
105+
(Female/Male) 而非展示名,不应把它当 alias 注入回去。"""
106+
return {
107+
alias.casefold(): voice_id
108+
for alias, voice_id in configured.items()
109+
if alias and voice_id
110+
}
111+
112+
113+
def _create_provider() -> NativeVoiceProvider:
114+
"""Always succeed — provider 必须留在 registry 里,否则下游 routing 会
115+
把内置 Gemini 音色误判为 custom。catalog/默认值上面已经走过 config →
116+
fallback 的 OR 链,到这里保证非空。"""
117+
aliases_source = _CFG.get("aliases") or _FALLBACK_GEMINI_TTS_VOICE_ALIASES
118+
return NativeVoiceProvider(
119+
key="gemini",
120+
catalog=GEMINI_TTS_VOICE_GENDERS,
121+
aliases=_build_aliases(aliases_source),
122+
default_voice=GEMINI_TTS_DEFAULT_VOICE,
123+
default_male_voice=GEMINI_TTS_DEFAULT_MALE_VOICE,
124+
catalog_prefix=_CFG.get("catalog_prefix") or "Gemini",
125+
catalog_value_is_display_name=bool(
126+
_CFG.get("catalog_value_is_display_name", False)
127+
),
128+
)
129+
78130

131+
GEMINI_PROVIDER = _create_provider()
79132
register_provider(GEMINI_PROVIDER)
80133

81134

utils/grok_tts_voices.py

Lines changed: 69 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,50 +6,96 @@
66
native (not custom), and `get_tts_worker` dispatches to
77
`grok_streaming_tts_worker` instead of falling through to `cosyvoice_vc_tts_worker`.
88
9+
音色 ID、性别标签和默认值优先读取自 config/api_providers.json 的
10+
native_tts_voice_providers.grok。fallback 常量是 PR #1336 之前的硬编码目录的
11+
副本,仅在 JSON 加载失败时兜底——此时 provider 仍必须留在 registry 里,
12+
否则 `is_native_voice("leo", "grok")` 返 False,`core._has_custom_tts()` 把
13+
eve/leo 之类内置音色当 custom,最终 `get_tts_worker` 路由到
14+
`cosyvoice_vc_tts_worker` 而非 `grok_streaming_tts_worker`,比"丢失目录"
15+
更隐蔽的 routing 回归。
16+
917
Voice list reference: xAI `GET /v1/tts/voices` (eve / ara / leo / rex / sal).
1018
The upstream API expects lowercase voice ids; we mirror that in the catalog.
1119
"""
1220

21+
from utils.api_config_loader import get_native_tts_voice_provider_config
1322
from utils.native_voice_registry import (
1423
NativeVoiceProvider,
1524
register_provider,
1625
)
1726

18-
GROK_TTS_DEFAULT_VOICE = "eve"
19-
GROK_TTS_DEFAULT_MALE_VOICE = "leo"
27+
FALLBACK_GROK_TTS_DEFAULT_VOICE = "eve"
28+
FALLBACK_GROK_TTS_DEFAULT_MALE_VOICE = "leo"
2029

21-
# xAI's published voice catalog. Gender labels are best-effort inferences from
22-
# the canonical given-name associations — xAI's docs only list voice_id + name
23-
# + language, not gender. The labels feed the UI display only; routing /
24-
# dispatch only consult the keys.
25-
GROK_TTS_VOICE_GENDERS: dict[str, str] = {
30+
# 与 api_providers.json 的 native_tts_voice_providers.grok.voices 保持同形;
31+
# config 是权威源,这份是 JSON 加载失败时的兜底,保证 provider 始终注册。
32+
# Gender 标签是 best-effort 推断(xAI 文档只列 voice_id + name + language),
33+
# 仅用于 UI 展示,routing/dispatch 只看 key。
34+
_FALLBACK_GROK_TTS_VOICE_GENDERS: dict[str, str] = {
2635
"eve": "Female",
2736
"ara": "Female",
2837
"leo": "Male",
2938
"rex": "Male",
3039
"sal": "Male",
3140
}
3241

33-
_GROK_TTS_VOICE_ALIASES: dict[str, str] = {
34-
"male": GROK_TTS_DEFAULT_MALE_VOICE,
35-
"man": GROK_TTS_DEFAULT_MALE_VOICE,
36-
"男": GROK_TTS_DEFAULT_MALE_VOICE,
37-
"男声": GROK_TTS_DEFAULT_MALE_VOICE,
38-
"female": GROK_TTS_DEFAULT_VOICE,
39-
"woman": GROK_TTS_DEFAULT_VOICE,
40-
"女": GROK_TTS_DEFAULT_VOICE,
41-
"女声": GROK_TTS_DEFAULT_VOICE,
42+
_FALLBACK_GROK_TTS_VOICE_ALIASES: dict[str, str] = {
43+
"male": FALLBACK_GROK_TTS_DEFAULT_MALE_VOICE,
44+
"man": FALLBACK_GROK_TTS_DEFAULT_MALE_VOICE,
45+
"男": FALLBACK_GROK_TTS_DEFAULT_MALE_VOICE,
46+
"男声": FALLBACK_GROK_TTS_DEFAULT_MALE_VOICE,
47+
"female": FALLBACK_GROK_TTS_DEFAULT_VOICE,
48+
"woman": FALLBACK_GROK_TTS_DEFAULT_VOICE,
49+
"女": FALLBACK_GROK_TTS_DEFAULT_VOICE,
50+
"女声": FALLBACK_GROK_TTS_DEFAULT_VOICE,
4251
}
4352

44-
GROK_PROVIDER = NativeVoiceProvider(
45-
key="grok",
46-
catalog=GROK_TTS_VOICE_GENDERS,
47-
aliases=_GROK_TTS_VOICE_ALIASES,
48-
default_voice=GROK_TTS_DEFAULT_VOICE,
49-
default_male_voice=GROK_TTS_DEFAULT_MALE_VOICE,
50-
catalog_prefix="Grok",
53+
54+
def _load_provider_config() -> dict:
55+
return get_native_tts_voice_provider_config("grok")
56+
57+
58+
_CFG = _load_provider_config()
59+
60+
GROK_TTS_VOICE_GENDERS: dict[str, str] = (
61+
_CFG.get("voices") or _FALLBACK_GROK_TTS_VOICE_GENDERS
62+
)
63+
GROK_TTS_DEFAULT_VOICE = (
64+
_CFG.get("default_voice") or FALLBACK_GROK_TTS_DEFAULT_VOICE
5165
)
66+
GROK_TTS_DEFAULT_MALE_VOICE = (
67+
_CFG.get("default_male_voice") or FALLBACK_GROK_TTS_DEFAULT_MALE_VOICE
68+
)
69+
70+
71+
def _build_aliases(configured: dict[str, str]) -> dict[str, str]:
72+
"""同 gemini_tts_voices:只 casefold configured aliases,不把 catalog 的
73+
Female/Male 标签当 alias 注入。"""
74+
return {
75+
alias.casefold(): voice_id
76+
for alias, voice_id in configured.items()
77+
if alias and voice_id
78+
}
79+
80+
81+
def _create_provider() -> NativeVoiceProvider:
82+
"""Always succeed — provider 必须留在 registry,否则下游 routing 会
83+
把 eve/leo 这种内置 voice 当 custom 走 cosyvoice。"""
84+
aliases_source = _CFG.get("aliases") or _FALLBACK_GROK_TTS_VOICE_ALIASES
85+
return NativeVoiceProvider(
86+
key="grok",
87+
catalog=GROK_TTS_VOICE_GENDERS,
88+
aliases=_build_aliases(aliases_source),
89+
default_voice=GROK_TTS_DEFAULT_VOICE,
90+
default_male_voice=GROK_TTS_DEFAULT_MALE_VOICE,
91+
catalog_prefix=_CFG.get("catalog_prefix") or "Grok",
92+
catalog_value_is_display_name=bool(
93+
_CFG.get("catalog_value_is_display_name", False)
94+
),
95+
)
96+
5297

98+
GROK_PROVIDER = _create_provider()
5399
register_provider(GROK_PROVIDER)
54100

55101

0 commit comments

Comments
 (0)