Code Duplication Report: TTS Models
Overview
Analysis of all 9 TTS model implementations on main (22,478 lines across 50 files) reveals significant infrastructure boilerplate repeated across every model.
Codebase Inventory
| Model |
Files |
Lines |
Largest File |
| Chatterbox |
15 |
7,512 |
ChatterboxModel (1,374) |
| Qwen3TTS |
6 |
3,987 |
SpeechTokenizer (1,455) |
| Soprano |
4 |
1,994 |
Soprano (1,045) |
| EchoTTS |
6 |
1,841 |
EchoDiT (710) |
| FishSpeech |
4 |
1,721 |
FishSpeechModel (1,033) |
| Marvis |
3 |
1,684 |
MarvisTTSModel (672) |
| PocketTTS |
8 |
1,534 |
PocketTTSModel (375) |
| Llama |
2 |
1,148 |
LlamaTTS (982) |
| Qwen3 |
2 |
1,057 |
Qwen3 (943) |
| Total |
50 |
22,478 |
|
Infrastructure Boilerplate
Every model reimplements the same loading pipeline with minor variations:
| Pattern |
Models |
Avg Lines |
Similarity |
fromPretrained() |
9/9 |
~63 |
95% identical |
| Quantization setup |
7/9 |
~15 |
90% identical |
sanitize() weights |
8/9 |
~50 |
70% similar |
generate()/generateStream() |
9/9 |
~35 |
85% identical |
| Config structs (Decodable) |
9/9 |
~235 |
60% similar |
The fromPretrained flow is nearly copy-paste across all models:
- Resolve/download HF repo → 2. Decode config.json → 3. Load .safetensors → 4. Sanitize keys → 5. Apply quantization → 6. Update parameters → 7. Post-load hooks
Estimated duplicated lines: ~1,000+ across all models combined.
Examples
fromPretrained — compare any two models (e.g. Soprano.swift:894-961 vs FishSpeechModel.swift:994-1032 vs LlamaTTS.swift:917-966): identical structure, only config type and post-load hooks differ.
Quantization — 7 models check config.quantization, extract (groupSize, bits), call quantize() with the same 3-line pattern.
sanitize() — all models filter/rename weight keys. Common operations: strip prefixes, rename gamma→weight/beta→bias, skip position_ids. Model-specific rules added on top.
Possible Approach
Extract common fromPretrained / quantization / sanitization patterns into a shared protocol or base implementation. This would eliminate ~1,000 lines of copy-paste boilerplate while preserving model-specific customization via hooks.
Open to discussion on the best approach — protocol with default extension, base class, or something else entirely.
Code Duplication Report: TTS Models
Overview
Analysis of all 9 TTS model implementations on
main(22,478 lines across 50 files) reveals significant infrastructure boilerplate repeated across every model.Codebase Inventory
Infrastructure Boilerplate
Every model reimplements the same loading pipeline with minor variations:
fromPretrained()sanitize()weightsgenerate()/generateStream()The
fromPretrainedflow is nearly copy-paste across all models:Estimated duplicated lines: ~1,000+ across all models combined.
Examples
fromPretrained— compare any two models (e.g.Soprano.swift:894-961vsFishSpeechModel.swift:994-1032vsLlamaTTS.swift:917-966): identical structure, only config type and post-load hooks differ.Quantization — 7 models check
config.quantization, extract(groupSize, bits), callquantize()with the same 3-line pattern.sanitize()— all models filter/rename weight keys. Common operations: strip prefixes, renamegamma→weight/beta→bias, skipposition_ids. Model-specific rules added on top.Possible Approach
Extract common
fromPretrained/ quantization / sanitization patterns into a shared protocol or base implementation. This would eliminate ~1,000 lines of copy-paste boilerplate while preserving model-specific customization via hooks.Open to discussion on the best approach — protocol with default extension, base class, or something else entirely.