Reduce TTS model boilerplate: ~1,000 lines of duplicated loading infrastructure

## Code Duplication Report: TTS Models

### Overview

Analysis of all 9 TTS model implementations on `main` (22,478 lines across 50 files) reveals significant infrastructure boilerplate repeated across every model.

### Codebase Inventory

| Model | Files | Lines | Largest File |
|-------|-------|-------|-------------|
| Chatterbox | 15 | 7,512 | ChatterboxModel (1,374) |
| Qwen3TTS | 6 | 3,987 | SpeechTokenizer (1,455) |
| Soprano | 4 | 1,994 | Soprano (1,045) |
| EchoTTS | 6 | 1,841 | EchoDiT (710) |
| FishSpeech | 4 | 1,721 | FishSpeechModel (1,033) |
| Marvis | 3 | 1,684 | MarvisTTSModel (672) |
| PocketTTS | 8 | 1,534 | PocketTTSModel (375) |
| Llama | 2 | 1,148 | LlamaTTS (982) |
| Qwen3 | 2 | 1,057 | Qwen3 (943) |
| **Total** | **50** | **22,478** | |

### Infrastructure Boilerplate

Every model reimplements the same loading pipeline with minor variations:

| Pattern | Models | Avg Lines | Similarity |
|---------|--------|-----------|-----------|
| `fromPretrained()` | 9/9 | ~63 | 95% identical |
| Quantization setup | 7/9 | ~15 | 90% identical |
| `sanitize()` weights | 8/9 | ~50 | 70% similar |
| `generate()`/`generateStream()` | 9/9 | ~35 | 85% identical |
| Config structs (Decodable) | 9/9 | ~235 | 60% similar |

The `fromPretrained` flow is nearly copy-paste across all models:
1. Resolve/download HF repo → 2. Decode config.json → 3. Load .safetensors → 4. Sanitize keys → 5. Apply quantization → 6. Update parameters → 7. Post-load hooks

**Estimated duplicated lines**: ~1,000+ across all models combined.

### Examples

**`fromPretrained`** — compare any two models (e.g. `Soprano.swift:894-961` vs `FishSpeechModel.swift:994-1032` vs `LlamaTTS.swift:917-966`): identical structure, only config type and post-load hooks differ.

**Quantization** — 7 models check `config.quantization`, extract `(groupSize, bits)`, call `quantize()` with the same 3-line pattern.

**`sanitize()`** — all models filter/rename weight keys. Common operations: strip prefixes, rename `gamma`→`weight`/`beta`→`bias`, skip `position_ids`. Model-specific rules added on top.

### Possible Approach

Extract common `fromPretrained` / quantization / sanitization patterns into a shared protocol or base implementation. This would eliminate ~1,000 lines of copy-paste boilerplate while preserving model-specific customization via hooks.

Open to discussion on the best approach — protocol with default extension, base class, or something else entirely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce TTS model boilerplate: ~1,000 lines of duplicated loading infrastructure #120

Code Duplication Report: TTS Models

Overview

Codebase Inventory

Infrastructure Boilerplate

Examples

Possible Approach

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model	Files	Lines	Largest File
Chatterbox	15	7,512	ChatterboxModel (1,374)
Qwen3TTS	6	3,987	SpeechTokenizer (1,455)
Soprano	4	1,994	Soprano (1,045)
EchoTTS	6	1,841	EchoDiT (710)
FishSpeech	4	1,721	FishSpeechModel (1,033)
Marvis	3	1,684	MarvisTTSModel (672)
PocketTTS	8	1,534	PocketTTSModel (375)
Llama	2	1,148	LlamaTTS (982)
Qwen3	2	1,057	Qwen3 (943)
Total	50	22,478

Pattern	Models	Avg Lines	Similarity
`fromPretrained()`	9/9	~63	95% identical
Quantization setup	7/9	~15	90% identical
`sanitize()` weights	8/9	~50	70% similar
`generate()`/`generateStream()`	9/9	~35	85% identical
Config structs (Decodable)	9/9	~235	60% similar

Uh oh!

Reduce TTS model boilerplate: ~1,000 lines of duplicated loading infrastructure #120

Description

Code Duplication Report: TTS Models

Overview

Codebase Inventory

Infrastructure Boilerplate

Examples

Possible Approach

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions