Skip to content

Commit af3ec1f

Browse files
Fix Hebrew Dicta loading for MLX Chatterbox
1 parent 0acacbf commit af3ec1f

File tree

5 files changed

+90
-56
lines changed

5 files changed

+90
-56
lines changed

RELEASE_NOTES.md

Lines changed: 23 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,51 @@
1-
# MimikaStudio v2026.03.5 Release Notes
1+
# MimikaStudio v2026.03.8 Release Notes
22

3-
**Release Date:** March 9, 2026
3+
**Release Date:** March 18, 2026
44
**Platform:** macOS (Apple Silicon)
55

66
---
77

8-
## What's New In v2026.03.5
8+
## What's Fixed In v2026.03.8
99

10-
- Added unified **Voice Prompt Management** workflows to create user voices by:
11-
- Uploading local WAV voice files, and
12-
- Importing from **YouTube URL** via `yt-dlp`, then extracting a 20-second speech segment for cloning prompts.
13-
- Added explicit voice-name uniqueness validation in backend and UI to prevent collisions with existing voice prompts and default voices.
14-
- Voice prompt lists now refresh immediately after add/edit/delete/import so newly introduced voices are instantly available in clone workflows.
10+
- Restored **Hebrew Dicta diacritization** for **Chatterbox Multilingual** after the MLX backend migration.
11+
- Fixed the regression where Hebrew Chatterbox generation could silently fall back to a broken tokenizer path and log:
12+
`Dicta.__init__() missing 1 required positional argument: 'model_path'`
13+
- Rewired MimikaStudio's Chatterbox engine to preload the Dicta ONNX model into the tokenizer module actually used by the MLX runtime before model load.
1514

1615
---
1716

18-
## Voice Clone Workflow Changes
17+
## User Impact
1918

20-
- Moved user-voice creation surfaces out of clone screens and centralized them in **Voice Prompts**.
21-
- Updated clone screens to point users to Voice Prompt management and refresh voice choices when returning to clone tabs.
22-
- Verified compatibility of custom voice prompts for both **Qwen3 Clone** and **Chatterbox** workflows.
23-
- Moved **MCP**, **Pro**, and **About** into **Settings** as sub-tabs to reduce top-level navigation clutter.
19+
- **Hebrew Chatterbox smoke tests** now run without the Dicta initialization warning.
20+
- **Full Hebrew voice-clone renders** complete successfully again on the patched MLX path.
21+
- Existing Hebrew Dicta installs are now picked up from:
22+
- `DICTA_ONNX_MODEL_PATH`
23+
- bundled app model path
24+
- Mimika runtime data model path
2425

2526
---
2627

27-
## Reliability and Pre-Production Hardening
28+
## Technical Notes
2829

29-
- Added YouTube source URL allowlist checks (`youtube.com` / `youtu.be`) for safer import handling.
30-
- Added serialized voice mutation guards to reduce race-condition collisions on upload/update/delete/import endpoints.
31-
- Added failure cleanup paths to avoid orphaned audio/transcript/meta files when import/upload writes fail.
32-
- Added pre-production regression tests for voice prompt import validation and file-cleanup failure paths.
33-
- Added additional `OsxSkills` pre-production guardrail tests for:
34-
- skill metadata/front matter integrity,
35-
- required release script presence,
36-
- shell script syntax checks.
30+
- Updated the backend Chatterbox engine to resolve an explicit Dicta ONNX path instead of relying on a zero-argument `Dicta()` constructor.
31+
- Patched both supported tokenizer import paths so Hebrew preprocessing is applied consistently across the MLX-backed Chatterbox runtime.
32+
- Verified generation with:
33+
- a short Hebrew smoke test
34+
- a full Hebrew `parashat_hashavua.txt` render using a custom uploaded reference voice
3735

3836
---
3937

40-
## Previous Release: v2026.03.4
38+
## Previous Release: v2026.03.7
4139

42-
- Added a new copy-text action in **Qwen3 Clone -> Audio Library** so users can copy the source text used for a generated voice clone directly from each item.
43-
- Added a new copy-text action in the **Jobs** tab so users can copy the input text for completed/active generation jobs.
44-
- Extended job payload handling to retain source text for newly created jobs (with truncation safeguards), enabling copy behavior across Jobs and voice-clone audio list responses.
45-
46-
---
47-
48-
## Supertonic UI Update
49-
50-
- Added automatic **British** badge support in Supertonic voice cards when backend voice metadata indicates UK/British variants.
51-
- Current Supertonic runtime still exposes generic style IDs (`F1..F5`, `M1..M5`), so badge display activates when metadata becomes available.
52-
53-
---
54-
55-
## Reliability Improvements
56-
57-
- Added bounded text retention controls for job records via `MIMIKA_MAX_JOB_TEXT_CHARS` (default `20000`) to avoid unbounded in-memory text growth.
58-
- Avoided storing full audiobook source text in job history to keep long-form flows lightweight.
40+
- Sandboxed release entitlement update and release-build hardening.
5941

6042
---
6143

6244
## Distribution Notes
6345

6446
### Unsigned DMG (Apple Gatekeeper)
6547

66-
As of March 2, 2026, the MimikaStudio DMG is not yet signed/notarized by Apple.
48+
As of March 18, 2026, the MimikaStudio DMG is not yet signed/notarized by Apple.
6749
macOS may block first launch until you explicitly allow it in security settings.
6850

6951
1. Open the DMG and drag MimikaStudio.app to Applications.
@@ -72,11 +54,3 @@ macOS may block first launch until you explicitly allow it in security settings.
7254
4. If macOS still blocks launch, go to: System Settings -> Privacy & Security -> Open Anyway (for MimikaStudio), then confirm with password/Touch ID.
7355
5. On first launch, wait for the bundled backend to start.
7456
6. On first use, click Download for required models in-app.
75-
76-
---
77-
78-
## System Requirements
79-
80-
- macOS 13.0 or later
81-
- Apple Silicon (M1/M2/M3/M4)
82-
- 8 GB RAM minimum (16 GB recommended)

backend/tts/chatterbox_engine.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
"""Chatterbox Multilingual TTS engine wrapper for voice cloning."""
22
from __future__ import annotations
33

4+
import os
5+
import importlib
46
import re
57
import uuid
68
from dataclasses import dataclass
@@ -15,6 +17,7 @@
1517
from .runtime_paths import (
1618
ensure_valid_cwd,
1719
get_cloner_user_voices_dir,
20+
get_runtime_data_dir,
1821
get_runtime_output_dir,
1922
)
2023
from .text_chunking import smart_chunk_text
@@ -74,6 +77,62 @@ def _get_device(self) -> str:
7477
return "mlx"
7578
return "cpu"
7679

80+
def _resolve_dicta_model_path(self) -> Optional[Path]:
81+
env_path = (os.getenv("DICTA_ONNX_MODEL_PATH") or "").strip()
82+
candidates = []
83+
if env_path:
84+
candidates.append(Path(env_path).expanduser())
85+
candidates.extend(
86+
[
87+
Path(__file__).parent.parent / "models" / "dicta-onnx" / "dicta-1.0.onnx",
88+
get_runtime_data_dir() / "models" / "dicta-onnx" / "dicta-1.0.onnx",
89+
]
90+
)
91+
92+
for candidate in candidates:
93+
if candidate.exists():
94+
return candidate
95+
return None
96+
97+
def _configure_hebrew_diacritizer(self) -> None:
98+
"""Preload Dicta into the upstream Chatterbox tokenizer for Hebrew."""
99+
try:
100+
from dicta_onnx import Dicta
101+
except Exception:
102+
return
103+
104+
model_path = self._resolve_dicta_model_path()
105+
if model_path is None:
106+
return
107+
108+
module_names = [
109+
"mlx_audio.tts.models.chatterbox.tokenizer",
110+
"chatterbox.models.tokenizers.tokenizer",
111+
]
112+
113+
def _add_hebrew_diacritics(text: str, *, _tok) -> str:
114+
try:
115+
if getattr(_tok, "_dicta", None) is None:
116+
_tok._dicta = Dicta(str(model_path))
117+
return _tok._dicta.add_diacritics(text)
118+
except Exception:
119+
return text
120+
121+
for module_name in module_names:
122+
try:
123+
tok = importlib.import_module(module_name)
124+
except Exception:
125+
continue
126+
127+
try:
128+
if getattr(tok, "_dicta", None) is None:
129+
tok._dicta = Dicta(str(model_path))
130+
tok.add_hebrew_diacritics = (
131+
lambda text, _tok=tok: _add_hebrew_diacritics(text, _tok=_tok)
132+
)
133+
except Exception:
134+
continue
135+
77136
def load_model(self):
78137
"""Load the Chatterbox model."""
79138
if self.model is not None:
@@ -89,6 +148,7 @@ def load_model(self):
89148
"mlx-audio not installed. Install with: pip install -U mlx-audio"
90149
) from exc
91150

151+
self._configure_hebrew_diacritizer()
92152
self.device = self._get_device()
93153
self.model = load_tts_model("mlx-community/chatterbox-fp16")
94154
return self.model

backend/version.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
"""MimikaStudio version information."""
22

3-
VERSION = "2026.03.7"
4-
BUILD_NUMBER = 13
5-
VERSION_NAME = "Fix Library Validation for Sandboxed Release Builds"
3+
VERSION = "2026.03.8"
4+
BUILD_NUMBER = 14
5+
VERSION_NAME = "Restore Hebrew Dicta Support for MLX Chatterbox"
66

77
def get_version_string() -> str:
88
"""Return formatted version string."""

flutter_app/lib/version.dart

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/// MimikaStudio version information.
2-
const String appVersion = "2026.03.5";
3-
const int buildNumber = 11;
4-
const String versionName = "Voice Prompt Import Workflow and Pre-Production Hardening";
2+
const String appVersion = "2026.03.8";
3+
const int buildNumber = 14;
4+
const String versionName = "Restore Hebrew Dicta Support for MLX Chatterbox";
55

66
String get versionString => "$appVersion (build $buildNumber)";

flutter_app/pubspec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name: mimika_studio
22
description: "MimikaStudio - Local-first Voice Cloning with Qwen3-TTS"
33
publish_to: 'none'
4-
version: 2026.03.7+13
4+
version: 2026.03.8+14
55

66
environment:
77
sdk: ^3.10.7

0 commit comments

Comments
 (0)