Skip to content

feat(mimo-tts): support voiceclone model with reference audio#9106

Open
lingyun14beta wants to merge 1 commit into
AstrBotDevs:masterfrom
lingyun14beta:mimo
Open

feat(mimo-tts): support voiceclone model with reference audio#9106
lingyun14beta wants to merge 1 commit into
AstrBotDevs:masterfrom
lingyun14beta:mimo

Conversation

@lingyun14beta

@lingyun14beta lingyun14beta commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

fix #9105

Modifications / 改动点

  • astrbot/core/config/default.py:新增配置项 mimo-tts-voiceclone-audio(参考音频路径/URL/base64),并更新 mimo-tts-voice 的 hint,说明其在 voiceclone 模型下会被忽略
  • astrbot/core/provider/sources/mimo_api_common.pyprepare_audio_input() 增加 target_format/preserve_mp3 可选参数(默认值保持原行为,向后兼容)
  • astrbot/core/provider/sources/mimo_tts_api_source.py
    • 新增 _is_voiceclone_model() 判断当前模型是否为 voiceclone
    • 新增 _resolve_voiceclone_voice(),将参考音频转换为 data URL 并按来源缓存,使用 asyncio.Lock 防止并发请求重复转码
    • 未配置参考音频时抛出清晰的 MiMoAPIError
    • _build_payload() 支持通过 voice_value 覆盖音色字段,写入转换后的 data URL
    • 转码时优先保留原始 mp3(preserve_mp3=True),避免转 wav 后体积膨胀超过官方 10MB 限制
    • terminate() 中补充清理转码产生的临时文件
  • dashboard/src/i18n/locales/{en-US,ru-RU,zh-CN}/features/config-metadata.json:同步补充新字段的多语言文案
  • tests/test_mimo_api_sources.py:新增 7 个测试用例,覆盖模型判定、未配置报错、data URL 写入 payload、缓存生效/换源刷新、mp3 保留、并发调用只转码一次
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add support for MiMo TTS voiceclone model using a configurable reference audio source and integrate it into the existing TTS provider pipeline.

New Features:

  • Introduce the mimo-tts-voiceclone-audio configuration option to supply a reference audio sample for MiMo voiceclone models.
  • Support passing a voice value (e.g., data URL from reference audio) into the TTS payload, overriding the standard voice setting when needed.

Enhancements:

  • Cache converted voiceclone reference audio with concurrency control to avoid redundant conversions and temporary file leaks.
  • Extend audio input preparation to accept optional target format and mp3-preservation flags for more flexible handling of input audio.
  • Ensure cleanup of temporary files generated during voiceclone audio preparation when the provider is terminated.

Documentation:

  • Update configuration metadata and dashboard i18n strings to document the new voiceclone reference audio option and clarify when mimo-tts-voice is ignored.

Tests:

  • Add tests covering voiceclone model detection, error handling when reference audio is missing, payload voice field behavior, caching semantics, mp3 preservation, and concurrent conversion behavior.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Jul 1, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The _is_voiceclone_model check relies on a substring match in model_name; consider tightening this logic (e.g., explicit allowed model list or regex) to avoid accidentally treating future models with similar names as voiceclone-capable.
  • The error message in _resolve_voiceclone_voice hardcodes mimo-v2.5-tts-voiceclone even though the check is generic; consider interpolating self.model_name instead so the message stays accurate if other voiceclone models are used.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_is_voiceclone_model` check relies on a substring match in `model_name`; consider tightening this logic (e.g., explicit allowed model list or regex) to avoid accidentally treating future models with similar names as voiceclone-capable.
- The error message in `_resolve_voiceclone_voice` hardcodes `mimo-v2.5-tts-voiceclone` even though the check is generic; consider interpolating `self.model_name` instead so the message stays accurate if other voiceclone models are used.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/mimo_tts_api_source.py" line_range="102-111" />
<code_context>
+        async with self._voiceclone_lock:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider reusing the lock in terminate() to avoid potential races with voiceclone cleanup.

terminate() calls cleanup_files(self._voiceclone_cleanup_paths) without holding _voiceclone_lock, while _resolve_voiceclone_voice() accesses and mutates _voiceclone_cleanup_paths under that lock. If terminate() runs while voiceclone resolution is in progress, this can cause races or inconsistent cleanup. Please guard terminate()’s access to _voiceclone_cleanup_paths with the same lock (or otherwise prevent concurrent access).

Suggested implementation:

```python
        async with self._voiceclone_lock:
            cleanup_files(self._voiceclone_cleanup_paths)

```

To safely reuse `_voiceclone_lock` in `terminate()`:
1. Ensure `terminate()` is an `async def` so that `async with self._voiceclone_lock:` is valid. If `terminate()` must remain synchronous, instead move the cleanup into an `async` helper (e.g. `_async_terminate_cleanup`) that uses the lock, and have `terminate()` schedule/await that helper where appropriate.
2. Verify that all other accesses and mutations of `self._voiceclone_cleanup_paths` (if any) are also guarded by `_voiceclone_lock` to fully prevent races.
</issue_to_address>

### Comment 2
<location path="tests/test_mimo_api_sources.py" line_range="321-224" />
<code_context>
+async def test_mimo_tts_voiceclone_preserves_mp3_instead_of_forcing_wav(monkeypatch):
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test that verifies temporary files for voiceclone are cleaned up on terminate

The current tests for `_resolve_voiceclone_voice` and `preserve_mp3` cover caching/conversion, but don’t exercise the new `self._voiceclone_cleanup_paths` tracking or the `cleanup_files` calls on refresh/`terminate()`. Please add a test that monkeypatches `cleanup_files` in `mimo_tts_api_source`, performs a voiceclone conversion to populate `_voiceclone_cleanup_paths`, calls `terminate()`, and asserts that `cleanup_files` is called with the expected paths and that the cleanup list is cleared. This will directly validate the new temp-file cleanup behavior for voiceclone.

Suggested implementation:

```python
    captured_kwargs: dict = {}


@pytest.mark.asyncio
async def test_mimo_tts_voiceclone_temp_files_cleaned_on_terminate(monkeypatch):
    """voiceclone 的临时文件应在 terminate() 时被清理"""
    provider = _make_tts_provider(
        {
            "model": "mimo-v2.5-tts-voiceclone",
            "mimo-tts-voiceclone-audio": "/tmp/reference_voice.mp3",
            "mimo-tts-seed-text": "",
        }
    )

    # monkeypatch cleanup_files 以便观察调用情况
    cleanup_calls = []

    # 注意:mimo_tts_api_source 的导入路径可能需要根据项目结构调整
    import mimo_tts_api_source  # type: ignore

    async def fake_cleanup_files(paths):
        # 记录被请求清理的路径
        cleanup_calls.append(list(paths))

    monkeypatch.setattr(mimo_tts_api_source, "cleanup_files", fake_cleanup_files)

    # 触发一次 voiceclone 转换以填充 _voiceclone_cleanup_paths
    provider.voiceclone_audio_source = "/tmp/voice_a.mp3"
    await provider._resolve_voiceclone_voice()

    # 确认有待清理的临时文件被记录
    assert getattr(provider, "_voiceclone_cleanup_paths", []), "_voiceclone_cleanup_paths 应在转换后包含临时文件路径"

    # 记录当前待清理路径,用于后续断言
    paths_to_cleanup = list(provider._voiceclone_cleanup_paths)

    # 调用 terminate,应触发 cleanup_files 并清空 _voiceclone_cleanup_paths
    await provider.terminate()

    # 验证 cleanup_files 被调用且传入的路径与记录的一致
    assert cleanup_calls == [paths_to_cleanup]

    # 验证清理列表已被清空
    assert provider._voiceclone_cleanup_paths == []

```

1. 如果 `mimo_tts_api_source` 在测试文件中已有导入(例如 `from src.mimo_tts_api_source import cleanup_files` 或类似),请删除该测试中的局部 `import mimo_tts_api_source` 并改为使用正确的模块引用路径,例如:
   - `import src.mimo_tts_api_source as mimo_tts_api_source`,或
   - `from src import mimo_tts_api_source`2. 确保 `provider` 实例在项目中确实存在私有属性 `self._voiceclone_cleanup_paths`,且 `terminate()` 会调用 `cleanup_files(self._voiceclone_cleanup_paths)` 并在完成后清空该列表。如果实现略有不同(例如属性名或清空逻辑不一致),请相应调整测试中的属性访问和断言。
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +102 to +111
async with self._voiceclone_lock:
if (
self._voiceclone_cache_data_url is not None
and self._voiceclone_cache_source == self.voiceclone_audio_source
):
return self._voiceclone_cache_data_url

try:
data_url, cleanup_paths = await prepare_audio_input(
self.voiceclone_audio_source,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider reusing the lock in terminate() to avoid potential races with voiceclone cleanup.

terminate() calls cleanup_files(self._voiceclone_cleanup_paths) without holding _voiceclone_lock, while _resolve_voiceclone_voice() accesses and mutates _voiceclone_cleanup_paths under that lock. If terminate() runs while voiceclone resolution is in progress, this can cause races or inconsistent cleanup. Please guard terminate()’s access to _voiceclone_cleanup_paths with the same lock (or otherwise prevent concurrent access).

Suggested implementation:

        async with self._voiceclone_lock:
            cleanup_files(self._voiceclone_cleanup_paths)

To safely reuse _voiceclone_lock in terminate():

  1. Ensure terminate() is an async def so that async with self._voiceclone_lock: is valid. If terminate() must remain synchronous, instead move the cleanup into an async helper (e.g. _async_terminate_cleanup) that uses the lock, and have terminate() schedule/await that helper where appropriate.
  2. Verify that all other accesses and mutations of self._voiceclone_cleanup_paths (if any) are also guarded by _voiceclone_lock to fully prevent races.

with pytest.raises(MiMoAPIError, match="mimo-tts-voiceclone-audio"):
await provider.get_audio("hello")
finally:
await provider.terminate()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a test that verifies temporary files for voiceclone are cleaned up on terminate

The current tests for _resolve_voiceclone_voice and preserve_mp3 cover caching/conversion, but don’t exercise the new self._voiceclone_cleanup_paths tracking or the cleanup_files calls on refresh/terminate(). Please add a test that monkeypatches cleanup_files in mimo_tts_api_source, performs a voiceclone conversion to populate _voiceclone_cleanup_paths, calls terminate(), and asserts that cleanup_files is called with the expected paths and that the cleanup list is cleared. This will directly validate the new temp-file cleanup behavior for voiceclone.

Suggested implementation:

    captured_kwargs: dict = {}


@pytest.mark.asyncio
async def test_mimo_tts_voiceclone_temp_files_cleaned_on_terminate(monkeypatch):
    """voiceclone 的临时文件应在 terminate() 时被清理"""
    provider = _make_tts_provider(
        {
            "model": "mimo-v2.5-tts-voiceclone",
            "mimo-tts-voiceclone-audio": "/tmp/reference_voice.mp3",
            "mimo-tts-seed-text": "",
        }
    )

    # monkeypatch cleanup_files 以便观察调用情况
    cleanup_calls = []

    # 注意:mimo_tts_api_source 的导入路径可能需要根据项目结构调整
    import mimo_tts_api_source  # type: ignore

    async def fake_cleanup_files(paths):
        # 记录被请求清理的路径
        cleanup_calls.append(list(paths))

    monkeypatch.setattr(mimo_tts_api_source, "cleanup_files", fake_cleanup_files)

    # 触发一次 voiceclone 转换以填充 _voiceclone_cleanup_paths
    provider.voiceclone_audio_source = "/tmp/voice_a.mp3"
    await provider._resolve_voiceclone_voice()

    # 确认有待清理的临时文件被记录
    assert getattr(provider, "_voiceclone_cleanup_paths", []), "_voiceclone_cleanup_paths 应在转换后包含临时文件路径"

    # 记录当前待清理路径,用于后续断言
    paths_to_cleanup = list(provider._voiceclone_cleanup_paths)

    # 调用 terminate,应触发 cleanup_files 并清空 _voiceclone_cleanup_paths
    await provider.terminate()

    # 验证 cleanup_files 被调用且传入的路径与记录的一致
    assert cleanup_calls == [paths_to_cleanup]

    # 验证清理列表已被清空
    assert provider._voiceclone_cleanup_paths == []
  1. 如果 mimo_tts_api_source 在测试文件中已有导入(例如 from src.mimo_tts_api_source import cleanup_files 或类似),请删除该测试中的局部 import mimo_tts_api_source 并改为使用正确的模块引用路径,例如:
    • import src.mimo_tts_api_source as mimo_tts_api_source,或
    • from src import mimo_tts_api_source
  2. 确保 provider 实例在项目中确实存在私有属性 self._voiceclone_cleanup_paths,且 terminate() 会调用 cleanup_files(self._voiceclone_cleanup_paths) 并在完成后清空该列表。如果实现略有不同(例如属性名或清空逻辑不一致),请相应调整测试中的属性访问和断言。

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@lingyun14beta

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]支持mimo音色复刻模型

1 participant