Skip to content

Develop Professional-Grade Open-Source German Voice Pack 🗣️ (Hokuspokus-HUI Hybrid) #4

@semidark

Description

@semidark

The objective is to create a high-quality, legally compliant German voice pack for the kokoro TTS model. While excellent community-driven datasets like Thorsten-Voice exist and have paved the way for German TTS, this project aims to develop a professional narrator-style voice (high-dynamic range, audiobook prosody) that is fully redistributable under an open-source license.
Current Problem:

To achieve a "professional studio" sound for specific use cases (like long-form narration) while avoiding the legal risks of proprietary voices (e.g., AWS Polly), we need a dataset based on a professional-level speaker with a Public Domain license.

Proposed Solution:

  1. Source Material: Utilize the HUI Audio Corpus German, focusing on the speaker "Hokuspokus".
  • Rationale: "Hokuspokus" provides a consistent, highly-trained narration style ideal for high-end TTS.
    • Legal Status: Verified Public Domain (LibriVox/CC0), ensuring the resulting model can be shared freely.
  1. Hybrid Dataset Strategy:
  • Extract 2–5 hours of the best audio-transcript pairs from HUI (Solo projects).
    • Synthetic Augmentation: Use "Hokuspokus" samples as a seed for Qwen3 TTS 1.7b to generate additional training data. This will fill gaps in modern vocabulary and ensure high phoneme density.
  1. Refinement: Apply modern audio restoration (denoising/normalization) to the HUI samples to match the clarity of the synthetic Qwen3 outputs.

Technical Requirements:

  • Filter HUI Corpus for "Hokuspokus" solo projects to ensure acoustic consistency.
  • Verify alignment between HUI audio and provided transcripts.
  • Implement a Qwen3-based cloning pipeline for data augmentation.
  • Format the final hybrid dataset for the kokoro training recipe.

Legal & Ethical Compliance:

  • Zero Proprietary Data: No voices from commercial providers (Polly, ElevenLabs) used.
  • Public Domain Only: All source material is ethically sourced from the Public Domain or synthetically generated
  • Open Source: The final recipe and model weights will be released under an open license (e.g., Apache 2.0).

Additional Context:
This approach honors the groundwork laid by the German TTS community while pushing for a specific "narrator" aesthetic and utilizing the latest synthetic data generation techniques.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions