Develop Professional-Grade Open-Source German Voice Pack 🗣️ (Hokuspokus-HUI Hybrid)

The objective is to create a high-quality, legally compliant German voice pack for the kokoro TTS model. While excellent community-driven datasets like Thorsten-Voice exist and have paved the way for German TTS, this project aims to develop a professional narrator-style voice (high-dynamic range, audiobook prosody) that is fully redistributable under an open-source license.
Current Problem:

To achieve a "professional studio" sound for specific use cases (like long-form narration) while avoiding the legal risks of proprietary voices (e.g., AWS Polly), we need a dataset based on a professional-level speaker with a Public Domain license.

Proposed Solution:

   1. Source Material: Utilize the HUI Audio Corpus German, focusing on the speaker "Hokuspokus".
   * Rationale: "Hokuspokus" provides a consistent, highly-trained narration style ideal for high-end TTS.
      * Legal Status: Verified Public Domain (LibriVox/CC0), ensuring the resulting model can be shared freely.
   2. Hybrid Dataset Strategy:
   * Extract 2–5 hours of the best audio-transcript pairs from HUI (Solo projects).
      * Synthetic Augmentation: Use "Hokuspokus" samples as a seed for Qwen3 TTS 1.7b to generate additional training data. This will fill gaps in modern vocabulary and ensure high phoneme density.
   3. Refinement: Apply modern audio restoration (denoising/normalization) to the HUI samples to match the clarity of the synthetic Qwen3 outputs.

Technical Requirements:

* Filter HUI Corpus for "Hokuspokus" solo projects to ensure acoustic consistency.
* Verify alignment between HUI audio and provided transcripts.
* Implement a Qwen3-based cloning pipeline for data augmentation.
* Format the final hybrid dataset for the kokoro training recipe.

Legal & Ethical Compliance:

* Zero Proprietary Data: No voices from commercial providers (Polly, ElevenLabs) used.
* Public Domain Only: All source material is ethically sourced from the Public Domain or synthetically generated
* Open Source: The final recipe and model weights will be released under an open license (e.g., Apache 2.0).

Additional Context:
This approach honors the groundwork laid by the German TTS community while pushing for a specific "narrator" aesthetic and utilizing the latest synthetic data generation techniques.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop Professional-Grade Open-Source German Voice Pack 🗣️ (Hokuspokus-HUI Hybrid) #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Develop Professional-Grade Open-Source German Voice Pack 🗣️ (Hokuspokus-HUI Hybrid) #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions