This repository contains a collection of scripts designed to simplify the creation of custom voice packages for Valetudo, a vacuum cleaner robot middleware. Leveraging the Whisper and Piper TTS libraries, these scripts enable users to generate personalized voice packages by converting text to speech and integrating them seamlessly into Valetudo. You can also personalize the voice by finetuning a new Piper model from any audio source, allowing you to create unique voice prompts tailored to your preferences or based on any speaker.
mkdir -p data/original
scp -O -r root@{{ valetudo_ip }}:{{ robot_audio_files }} data/original/Note: In my case, robot audio files are under /audio/EN
You can use any audio source (e.g., YouTube, personal recordings) to create a custom voice model.
yt-dlp 'https://www.youtube.com/watch?v=XXXXXXXX'
ffmpeg -i input.webm -vn -acodec pcm_s16le -ar 22050 -ac 2 output.wavInstall requirements and run the script to generate text from your audio files using Whisper.
pip install -r requirements.txt
./get_sounds_list.pyRun the script to split your audio into different sentences for Piper training.
./split_audio.pyRun the script to generate the zip files needed for Piper model training or finetuning.
./split_files.shThis will produce files ready to be loaded into the Piper multilingual training notebook.
To create a custom voice, use the prepared data to finetune a Piper model. Follow the instructions in the Piper multilingual training notebook to train or finetune a model using your own audio and transcriptions. This allows you to generate a Piper model that matches any speaker from your audio source.
Once training is complete, download the resulting .onnx and .onnx.json model files.
Download a pre-trained or your personalized Piper model and use it to generate new audios.
# Example for downloading a pre-trained model
curl -LO 'https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/high/en_US-ryan-high.onnx?download=true'
curl -LO 'https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/high/en_US-ryan-high.onnx.json?download=true'
pip install piper-tts
./voice-generator.sh -m <your_model>.onnxtar -czf voice_pack.tar.gz -C data/output/EN/ .
md5sum voice_pack.tar.gz > md5sum.txtAfter uploading from Valetudo's interface, files are located at /data/personalized_voice/XX in
your robot.