feat(mimo-tts): support voiceclone model with reference audio#9106
feat(mimo-tts): support voiceclone model with reference audio#9106lingyun14beta wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
_is_voiceclone_modelcheck relies on a substring match inmodel_name; consider tightening this logic (e.g., explicit allowed model list or regex) to avoid accidentally treating future models with similar names as voiceclone-capable. - The error message in
_resolve_voiceclone_voicehardcodesmimo-v2.5-tts-voicecloneeven though the check is generic; consider interpolatingself.model_nameinstead so the message stays accurate if other voiceclone models are used.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_is_voiceclone_model` check relies on a substring match in `model_name`; consider tightening this logic (e.g., explicit allowed model list or regex) to avoid accidentally treating future models with similar names as voiceclone-capable.
- The error message in `_resolve_voiceclone_voice` hardcodes `mimo-v2.5-tts-voiceclone` even though the check is generic; consider interpolating `self.model_name` instead so the message stays accurate if other voiceclone models are used.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/mimo_tts_api_source.py" line_range="102-111" />
<code_context>
+ async with self._voiceclone_lock:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider reusing the lock in terminate() to avoid potential races with voiceclone cleanup.
terminate() calls cleanup_files(self._voiceclone_cleanup_paths) without holding _voiceclone_lock, while _resolve_voiceclone_voice() accesses and mutates _voiceclone_cleanup_paths under that lock. If terminate() runs while voiceclone resolution is in progress, this can cause races or inconsistent cleanup. Please guard terminate()’s access to _voiceclone_cleanup_paths with the same lock (or otherwise prevent concurrent access).
Suggested implementation:
```python
async with self._voiceclone_lock:
cleanup_files(self._voiceclone_cleanup_paths)
```
To safely reuse `_voiceclone_lock` in `terminate()`:
1. Ensure `terminate()` is an `async def` so that `async with self._voiceclone_lock:` is valid. If `terminate()` must remain synchronous, instead move the cleanup into an `async` helper (e.g. `_async_terminate_cleanup`) that uses the lock, and have `terminate()` schedule/await that helper where appropriate.
2. Verify that all other accesses and mutations of `self._voiceclone_cleanup_paths` (if any) are also guarded by `_voiceclone_lock` to fully prevent races.
</issue_to_address>
### Comment 2
<location path="tests/test_mimo_api_sources.py" line_range="321-224" />
<code_context>
+async def test_mimo_tts_voiceclone_preserves_mp3_instead_of_forcing_wav(monkeypatch):
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test that verifies temporary files for voiceclone are cleaned up on terminate
The current tests for `_resolve_voiceclone_voice` and `preserve_mp3` cover caching/conversion, but don’t exercise the new `self._voiceclone_cleanup_paths` tracking or the `cleanup_files` calls on refresh/`terminate()`. Please add a test that monkeypatches `cleanup_files` in `mimo_tts_api_source`, performs a voiceclone conversion to populate `_voiceclone_cleanup_paths`, calls `terminate()`, and asserts that `cleanup_files` is called with the expected paths and that the cleanup list is cleared. This will directly validate the new temp-file cleanup behavior for voiceclone.
Suggested implementation:
```python
captured_kwargs: dict = {}
@pytest.mark.asyncio
async def test_mimo_tts_voiceclone_temp_files_cleaned_on_terminate(monkeypatch):
"""voiceclone 的临时文件应在 terminate() 时被清理"""
provider = _make_tts_provider(
{
"model": "mimo-v2.5-tts-voiceclone",
"mimo-tts-voiceclone-audio": "/tmp/reference_voice.mp3",
"mimo-tts-seed-text": "",
}
)
# monkeypatch cleanup_files 以便观察调用情况
cleanup_calls = []
# 注意:mimo_tts_api_source 的导入路径可能需要根据项目结构调整
import mimo_tts_api_source # type: ignore
async def fake_cleanup_files(paths):
# 记录被请求清理的路径
cleanup_calls.append(list(paths))
monkeypatch.setattr(mimo_tts_api_source, "cleanup_files", fake_cleanup_files)
# 触发一次 voiceclone 转换以填充 _voiceclone_cleanup_paths
provider.voiceclone_audio_source = "/tmp/voice_a.mp3"
await provider._resolve_voiceclone_voice()
# 确认有待清理的临时文件被记录
assert getattr(provider, "_voiceclone_cleanup_paths", []), "_voiceclone_cleanup_paths 应在转换后包含临时文件路径"
# 记录当前待清理路径,用于后续断言
paths_to_cleanup = list(provider._voiceclone_cleanup_paths)
# 调用 terminate,应触发 cleanup_files 并清空 _voiceclone_cleanup_paths
await provider.terminate()
# 验证 cleanup_files 被调用且传入的路径与记录的一致
assert cleanup_calls == [paths_to_cleanup]
# 验证清理列表已被清空
assert provider._voiceclone_cleanup_paths == []
```
1. 如果 `mimo_tts_api_source` 在测试文件中已有导入(例如 `from src.mimo_tts_api_source import cleanup_files` 或类似),请删除该测试中的局部 `import mimo_tts_api_source` 并改为使用正确的模块引用路径,例如:
- `import src.mimo_tts_api_source as mimo_tts_api_source`,或
- `from src import mimo_tts_api_source`。
2. 确保 `provider` 实例在项目中确实存在私有属性 `self._voiceclone_cleanup_paths`,且 `terminate()` 会调用 `cleanup_files(self._voiceclone_cleanup_paths)` 并在完成后清空该列表。如果实现略有不同(例如属性名或清空逻辑不一致),请相应调整测试中的属性访问和断言。
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async with self._voiceclone_lock: | ||
| if ( | ||
| self._voiceclone_cache_data_url is not None | ||
| and self._voiceclone_cache_source == self.voiceclone_audio_source | ||
| ): | ||
| return self._voiceclone_cache_data_url | ||
|
|
||
| try: | ||
| data_url, cleanup_paths = await prepare_audio_input( | ||
| self.voiceclone_audio_source, |
There was a problem hiding this comment.
suggestion (bug_risk): Consider reusing the lock in terminate() to avoid potential races with voiceclone cleanup.
terminate() calls cleanup_files(self._voiceclone_cleanup_paths) without holding _voiceclone_lock, while _resolve_voiceclone_voice() accesses and mutates _voiceclone_cleanup_paths under that lock. If terminate() runs while voiceclone resolution is in progress, this can cause races or inconsistent cleanup. Please guard terminate()’s access to _voiceclone_cleanup_paths with the same lock (or otherwise prevent concurrent access).
Suggested implementation:
async with self._voiceclone_lock:
cleanup_files(self._voiceclone_cleanup_paths)To safely reuse _voiceclone_lock in terminate():
- Ensure
terminate()is anasync defso thatasync with self._voiceclone_lock:is valid. Ifterminate()must remain synchronous, instead move the cleanup into anasynchelper (e.g._async_terminate_cleanup) that uses the lock, and haveterminate()schedule/await that helper where appropriate. - Verify that all other accesses and mutations of
self._voiceclone_cleanup_paths(if any) are also guarded by_voiceclone_lockto fully prevent races.
| with pytest.raises(MiMoAPIError, match="mimo-tts-voiceclone-audio"): | ||
| await provider.get_audio("hello") | ||
| finally: | ||
| await provider.terminate() |
There was a problem hiding this comment.
suggestion (testing): Consider adding a test that verifies temporary files for voiceclone are cleaned up on terminate
The current tests for _resolve_voiceclone_voice and preserve_mp3 cover caching/conversion, but don’t exercise the new self._voiceclone_cleanup_paths tracking or the cleanup_files calls on refresh/terminate(). Please add a test that monkeypatches cleanup_files in mimo_tts_api_source, performs a voiceclone conversion to populate _voiceclone_cleanup_paths, calls terminate(), and asserts that cleanup_files is called with the expected paths and that the cleanup list is cleared. This will directly validate the new temp-file cleanup behavior for voiceclone.
Suggested implementation:
captured_kwargs: dict = {}
@pytest.mark.asyncio
async def test_mimo_tts_voiceclone_temp_files_cleaned_on_terminate(monkeypatch):
"""voiceclone 的临时文件应在 terminate() 时被清理"""
provider = _make_tts_provider(
{
"model": "mimo-v2.5-tts-voiceclone",
"mimo-tts-voiceclone-audio": "/tmp/reference_voice.mp3",
"mimo-tts-seed-text": "",
}
)
# monkeypatch cleanup_files 以便观察调用情况
cleanup_calls = []
# 注意:mimo_tts_api_source 的导入路径可能需要根据项目结构调整
import mimo_tts_api_source # type: ignore
async def fake_cleanup_files(paths):
# 记录被请求清理的路径
cleanup_calls.append(list(paths))
monkeypatch.setattr(mimo_tts_api_source, "cleanup_files", fake_cleanup_files)
# 触发一次 voiceclone 转换以填充 _voiceclone_cleanup_paths
provider.voiceclone_audio_source = "/tmp/voice_a.mp3"
await provider._resolve_voiceclone_voice()
# 确认有待清理的临时文件被记录
assert getattr(provider, "_voiceclone_cleanup_paths", []), "_voiceclone_cleanup_paths 应在转换后包含临时文件路径"
# 记录当前待清理路径,用于后续断言
paths_to_cleanup = list(provider._voiceclone_cleanup_paths)
# 调用 terminate,应触发 cleanup_files 并清空 _voiceclone_cleanup_paths
await provider.terminate()
# 验证 cleanup_files 被调用且传入的路径与记录的一致
assert cleanup_calls == [paths_to_cleanup]
# 验证清理列表已被清空
assert provider._voiceclone_cleanup_paths == []- 如果
mimo_tts_api_source在测试文件中已有导入(例如from src.mimo_tts_api_source import cleanup_files或类似),请删除该测试中的局部import mimo_tts_api_source并改为使用正确的模块引用路径,例如:import src.mimo_tts_api_source as mimo_tts_api_source,或from src import mimo_tts_api_source。
- 确保
provider实例在项目中确实存在私有属性self._voiceclone_cleanup_paths,且terminate()会调用cleanup_files(self._voiceclone_cleanup_paths)并在完成后清空该列表。如果实现略有不同(例如属性名或清空逻辑不一致),请相应调整测试中的属性访问和断言。
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
fix #9105
Modifications / 改动点
astrbot/core/config/default.py:新增配置项mimo-tts-voiceclone-audio(参考音频路径/URL/base64),并更新mimo-tts-voice的 hint,说明其在 voiceclone 模型下会被忽略astrbot/core/provider/sources/mimo_api_common.py:prepare_audio_input()增加target_format/preserve_mp3可选参数(默认值保持原行为,向后兼容)astrbot/core/provider/sources/mimo_tts_api_source.py:_is_voiceclone_model()判断当前模型是否为 voiceclone_resolve_voiceclone_voice(),将参考音频转换为 data URL 并按来源缓存,使用asyncio.Lock防止并发请求重复转码MiMoAPIError_build_payload()支持通过voice_value覆盖音色字段,写入转换后的 data URLpreserve_mp3=True),避免转 wav 后体积膨胀超过官方 10MB 限制terminate()中补充清理转码产生的临时文件dashboard/src/i18n/locales/{en-US,ru-RU,zh-CN}/features/config-metadata.json:同步补充新字段的多语言文案tests/test_mimo_api_sources.py:新增 7 个测试用例,覆盖模型判定、未配置报错、data URL 写入 payload、缓存生效/换源刷新、mp3 保留、并发调用只转码一次Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add support for MiMo TTS voiceclone model using a configurable reference audio source and integrate it into the existing TTS provider pipeline.
New Features:
Enhancements:
Documentation:
Tests: