Skip to content

Reduce TTS model boilerplate: ~1,000 lines of duplicated loading infrastructure #120

@beshkenadze

Description

@beshkenadze

Code Duplication Report: TTS Models

Overview

Analysis of all 9 TTS model implementations on main (22,478 lines across 50 files) reveals significant infrastructure boilerplate repeated across every model.

Codebase Inventory

Model Files Lines Largest File
Chatterbox 15 7,512 ChatterboxModel (1,374)
Qwen3TTS 6 3,987 SpeechTokenizer (1,455)
Soprano 4 1,994 Soprano (1,045)
EchoTTS 6 1,841 EchoDiT (710)
FishSpeech 4 1,721 FishSpeechModel (1,033)
Marvis 3 1,684 MarvisTTSModel (672)
PocketTTS 8 1,534 PocketTTSModel (375)
Llama 2 1,148 LlamaTTS (982)
Qwen3 2 1,057 Qwen3 (943)
Total 50 22,478

Infrastructure Boilerplate

Every model reimplements the same loading pipeline with minor variations:

Pattern Models Avg Lines Similarity
fromPretrained() 9/9 ~63 95% identical
Quantization setup 7/9 ~15 90% identical
sanitize() weights 8/9 ~50 70% similar
generate()/generateStream() 9/9 ~35 85% identical
Config structs (Decodable) 9/9 ~235 60% similar

The fromPretrained flow is nearly copy-paste across all models:

  1. Resolve/download HF repo → 2. Decode config.json → 3. Load .safetensors → 4. Sanitize keys → 5. Apply quantization → 6. Update parameters → 7. Post-load hooks

Estimated duplicated lines: ~1,000+ across all models combined.

Examples

fromPretrained — compare any two models (e.g. Soprano.swift:894-961 vs FishSpeechModel.swift:994-1032 vs LlamaTTS.swift:917-966): identical structure, only config type and post-load hooks differ.

Quantization — 7 models check config.quantization, extract (groupSize, bits), call quantize() with the same 3-line pattern.

sanitize() — all models filter/rename weight keys. Common operations: strip prefixes, rename gammaweight/betabias, skip position_ids. Model-specific rules added on top.

Possible Approach

Extract common fromPretrained / quantization / sanitization patterns into a shared protocol or base implementation. This would eliminate ~1,000 lines of copy-paste boilerplate while preserving model-specific customization via hooks.

Open to discussion on the best approach — protocol with default extension, base class, or something else entirely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions