Fix Hebrew Dicta loading for MLX Chatterbox

BoltzmannEntropy · BoltzmannEntropy · commit af3ec1f4f7fa · 2026-03-18T16:53:46.000+02:00
diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
@@ -1,69 +1,51 @@
-# MimikaStudio v2026.03.5 Release Notes
+# MimikaStudio v2026.03.8 Release Notes
 
-**Release Date:** March 9, 2026  
+**Release Date:** March 18, 2026  
 **Platform:** macOS (Apple Silicon)
 
 ---
 
-## What's New In v2026.03.5
+## What's Fixed In v2026.03.8
 
-- Added unified **Voice Prompt Management** workflows to create user voices by:
-  - Uploading local WAV voice files, and
-  - Importing from **YouTube URL** via `yt-dlp`, then extracting a 20-second speech segment for cloning prompts.
-- Added explicit voice-name uniqueness validation in backend and UI to prevent collisions with existing voice prompts and default voices.
-- Voice prompt lists now refresh immediately after add/edit/delete/import so newly introduced voices are instantly available in clone workflows.
+- Restored **Hebrew Dicta diacritization** for **Chatterbox Multilingual** after the MLX backend migration.
+- Fixed the regression where Hebrew Chatterbox generation could silently fall back to a broken tokenizer path and log:
+  `Dicta.__init__() missing 1 required positional argument: 'model_path'`
+- Rewired MimikaStudio's Chatterbox engine to preload the Dicta ONNX model into the tokenizer module actually used by the MLX runtime before model load.
 
 ---
 
-## Voice Clone Workflow Changes
+## User Impact
 
-- Moved user-voice creation surfaces out of clone screens and centralized them in **Voice Prompts**.
-- Updated clone screens to point users to Voice Prompt management and refresh voice choices when returning to clone tabs.
-- Verified compatibility of custom voice prompts for both **Qwen3 Clone** and **Chatterbox** workflows.
-- Moved **MCP**, **Pro**, and **About** into **Settings** as sub-tabs to reduce top-level navigation clutter.
+- **Hebrew Chatterbox smoke tests** now run without the Dicta initialization warning.
+- **Full Hebrew voice-clone renders** complete successfully again on the patched MLX path.
+- Existing Hebrew Dicta installs are now picked up from:
+  - `DICTA_ONNX_MODEL_PATH`
+  - bundled app model path
+  - Mimika runtime data model path
 
 ---
 
-## Reliability and Pre-Production Hardening
+## Technical Notes
 
-- Added YouTube source URL allowlist checks (`youtube.com` / `youtu.be`) for safer import handling.
-- Added serialized voice mutation guards to reduce race-condition collisions on upload/update/delete/import endpoints.
-- Added failure cleanup paths to avoid orphaned audio/transcript/meta files when import/upload writes fail.
-- Added pre-production regression tests for voice prompt import validation and file-cleanup failure paths.
-- Added additional `OsxSkills` pre-production guardrail tests for:
-  - skill metadata/front matter integrity,
-  - required release script presence,
-  - shell script syntax checks.
+- Updated the backend Chatterbox engine to resolve an explicit Dicta ONNX path instead of relying on a zero-argument `Dicta()` constructor.
+- Patched both supported tokenizer import paths so Hebrew preprocessing is applied consistently across the MLX-backed Chatterbox runtime.
+- Verified generation with:
+  - a short Hebrew smoke test
+  - a full Hebrew `parashat_hashavua.txt` render using a custom uploaded reference voice
 
 ---
 
-## Previous Release: v2026.03.4
+## Previous Release: v2026.03.7
 
-- Added a new copy-text action in **Qwen3 Clone -> Audio Library** so users can copy the source text used for a generated voice clone directly from each item.
-- Added a new copy-text action in the **Jobs** tab so users can copy the input text for completed/active generation jobs.
-- Extended job payload handling to retain source text for newly created jobs (with truncation safeguards), enabling copy behavior across Jobs and voice-clone audio list responses.
-
----
-
-## Supertonic UI Update
-
-- Added automatic **British** badge support in Supertonic voice cards when backend voice metadata indicates UK/British variants.
-- Current Supertonic runtime still exposes generic style IDs (`F1..F5`, `M1..M5`), so badge display activates when metadata becomes available.
-
----
-
-## Reliability Improvements
-
-- Added bounded text retention controls for job records via `MIMIKA_MAX_JOB_TEXT_CHARS` (default `20000`) to avoid unbounded in-memory text growth.
-- Avoided storing full audiobook source text in job history to keep long-form flows lightweight.
+- Sandboxed release entitlement update and release-build hardening.
 
 ---
 
 ## Distribution Notes
 
 ### Unsigned DMG (Apple Gatekeeper)
 
-As of March 2, 2026, the MimikaStudio DMG is not yet signed/notarized by Apple.  
+As of March 18, 2026, the MimikaStudio DMG is not yet signed/notarized by Apple.  
 macOS may block first launch until you explicitly allow it in security settings.
 
 1. Open the DMG and drag MimikaStudio.app to Applications.
@@ -72,11 +54,3 @@ macOS may block first launch until you explicitly allow it in security settings.
 4. If macOS still blocks launch, go to: System Settings -> Privacy & Security -> Open Anyway (for MimikaStudio), then confirm with password/Touch ID.
 5. On first launch, wait for the bundled backend to start.
 6. On first use, click Download for required models in-app.
-
----
-
-## System Requirements
-
-- macOS 13.0 or later
-- Apple Silicon (M1/M2/M3/M4)
-- 8 GB RAM minimum (16 GB recommended)
diff --git a/backend/tts/chatterbox_engine.py b/backend/tts/chatterbox_engine.py
@@ -1,6 +1,8 @@
 """Chatterbox Multilingual TTS engine wrapper for voice cloning."""
 from __future__ import annotations
 
+import os
+import importlib
 import re
 import uuid
 from dataclasses import dataclass
@@ -15,6 +17,7 @@
 from .runtime_paths import (
     ensure_valid_cwd,
     get_cloner_user_voices_dir,
+    get_runtime_data_dir,
     get_runtime_output_dir,
 )
 from .text_chunking import smart_chunk_text
@@ -74,6 +77,62 @@ def _get_device(self) -> str:
             return "mlx"
         return "cpu"
 
+    def _resolve_dicta_model_path(self) -> Optional[Path]:
+        env_path = (os.getenv("DICTA_ONNX_MODEL_PATH") or "").strip()
+        candidates = []
+        if env_path:
+            candidates.append(Path(env_path).expanduser())
+        candidates.extend(
+            [
+                Path(__file__).parent.parent / "models" / "dicta-onnx" / "dicta-1.0.onnx",
+                get_runtime_data_dir() / "models" / "dicta-onnx" / "dicta-1.0.onnx",
+            ]
+        )
+
+        for candidate in candidates:
+            if candidate.exists():
+                return candidate
+        return None
+
+    def _configure_hebrew_diacritizer(self) -> None:
+        """Preload Dicta into the upstream Chatterbox tokenizer for Hebrew."""
+        try:
+            from dicta_onnx import Dicta
+        except Exception:
+            return
+
+        model_path = self._resolve_dicta_model_path()
+        if model_path is None:
+            return
+
+        module_names = [
+            "mlx_audio.tts.models.chatterbox.tokenizer",
+            "chatterbox.models.tokenizers.tokenizer",
+        ]
+
+        def _add_hebrew_diacritics(text: str, *, _tok) -> str:
+            try:
+                if getattr(_tok, "_dicta", None) is None:
+                    _tok._dicta = Dicta(str(model_path))
+                return _tok._dicta.add_diacritics(text)
+            except Exception:
+                return text
+
+        for module_name in module_names:
+            try:
+                tok = importlib.import_module(module_name)
+            except Exception:
+                continue
+
+            try:
+                if getattr(tok, "_dicta", None) is None:
+                    tok._dicta = Dicta(str(model_path))
+                tok.add_hebrew_diacritics = (
+                    lambda text, _tok=tok: _add_hebrew_diacritics(text, _tok=_tok)
+                )
+            except Exception:
+                continue
+
     def load_model(self):
         """Load the Chatterbox model."""
         if self.model is not None:
@@ -89,6 +148,7 @@ def load_model(self):
                 "mlx-audio not installed. Install with: pip install -U mlx-audio"
             ) from exc
 
+        self._configure_hebrew_diacritizer()
         self.device = self._get_device()
         self.model = load_tts_model("mlx-community/chatterbox-fp16")
         return self.model
diff --git a/backend/version.py b/backend/version.py
@@ -1,8 +1,8 @@
 """MimikaStudio version information."""
 
-VERSION = "2026.03.7"
-BUILD_NUMBER = 13
-VERSION_NAME = "Fix Library Validation for Sandboxed Release Builds"
+VERSION = "2026.03.8"
+BUILD_NUMBER = 14
+VERSION_NAME = "Restore Hebrew Dicta Support for MLX Chatterbox"
 
 def get_version_string() -> str:
     """Return formatted version string."""
diff --git a/flutter_app/lib/version.dart b/flutter_app/lib/version.dart
@@ -1,6 +1,6 @@
 /// MimikaStudio version information.
-const String appVersion = "2026.03.5";
-const int buildNumber = 11;
-const String versionName = "Voice Prompt Import Workflow and Pre-Production Hardening";
+const String appVersion = "2026.03.8";
+const int buildNumber = 14;
+const String versionName = "Restore Hebrew Dicta Support for MLX Chatterbox";
 
 String get versionString => "$appVersion (build $buildNumber)";
diff --git a/flutter_app/pubspec.yaml b/flutter_app/pubspec.yaml
@@ -1,7 +1,7 @@
 name: mimika_studio
 description: "MimikaStudio - Local-first Voice Cloning with Qwen3-TTS"
 publish_to: 'none'
-version: 2026.03.7+13
+version: 2026.03.8+14
 
 environment:
   sdk: ^3.10.7