The objective is to create a high-quality, legally compliant German voice pack for the kokoro TTS model. While excellent community-driven datasets like Thorsten-Voice exist and have paved the way for German TTS, this project aims to develop a professional narrator-style voice (high-dynamic range, audiobook prosody) that is fully redistributable under an open-source license.
Current Problem:
To achieve a "professional studio" sound for specific use cases (like long-form narration) while avoiding the legal risks of proprietary voices (e.g., AWS Polly), we need a dataset based on a professional-level speaker with a Public Domain license.
Proposed Solution:
- Source Material: Utilize the HUI Audio Corpus German, focusing on the speaker "Hokuspokus".
- Rationale: "Hokuspokus" provides a consistent, highly-trained narration style ideal for high-end TTS.
- Legal Status: Verified Public Domain (LibriVox/CC0), ensuring the resulting model can be shared freely.
- Hybrid Dataset Strategy:
- Extract 2–5 hours of the best audio-transcript pairs from HUI (Solo projects).
- Synthetic Augmentation: Use "Hokuspokus" samples as a seed for Qwen3 TTS 1.7b to generate additional training data. This will fill gaps in modern vocabulary and ensure high phoneme density.
- Refinement: Apply modern audio restoration (denoising/normalization) to the HUI samples to match the clarity of the synthetic Qwen3 outputs.
Technical Requirements:
- Filter HUI Corpus for "Hokuspokus" solo projects to ensure acoustic consistency.
- Verify alignment between HUI audio and provided transcripts.
- Implement a Qwen3-based cloning pipeline for data augmentation.
- Format the final hybrid dataset for the kokoro training recipe.
Legal & Ethical Compliance:
- Zero Proprietary Data: No voices from commercial providers (Polly, ElevenLabs) used.
- Public Domain Only: All source material is ethically sourced from the Public Domain or synthetically generated
- Open Source: The final recipe and model weights will be released under an open license (e.g., Apache 2.0).
Additional Context:
This approach honors the groundwork laid by the German TTS community while pushing for a specific "narrator" aesthetic and utilizing the latest synthetic data generation techniques.
The objective is to create a high-quality, legally compliant German voice pack for the kokoro TTS model. While excellent community-driven datasets like Thorsten-Voice exist and have paved the way for German TTS, this project aims to develop a professional narrator-style voice (high-dynamic range, audiobook prosody) that is fully redistributable under an open-source license.
Current Problem:
To achieve a "professional studio" sound for specific use cases (like long-form narration) while avoiding the legal risks of proprietary voices (e.g., AWS Polly), we need a dataset based on a professional-level speaker with a Public Domain license.
Proposed Solution:
Technical Requirements:
Legal & Ethical Compliance:
Additional Context:
This approach honors the groundwork laid by the German TTS community while pushing for a specific "narrator" aesthetic and utilizing the latest synthetic data generation techniques.