Description
Self Checks
- This template is only for bug reports. For questions, please visit Discussions.
- I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
- I have searched for existing issues, including closed ones. Search issues
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
ubuntu 22.04 Python 3.10.15, torch==2.4.1, gradio==5.16.0
Steps to Reproduce
1.Follow steps on 'https://speech.fish.audio/zh/#linux' to setup linux env.
2.Follow steps on 'https://speech.fish.audio/zh/finetune/' to do SFT.I use 'edgetts' to generate about 1500 hour audio(24000Hz )-lab pairs, which correspond to about 1.5 hour duration, with 'xiaoxiao' voice(chinese).And then normalize these audios with "fap" tool.
3.Use the finetuned llama checkpoint to run webui:
python tools/run_webui.py
--llama-checkpoint-path checkpoints/fish-speech-1.5-yth-lora
--decoder-checkpoint-path checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth
--compile
✔️ Expected Behavior
1.Better TTS audio than using basic llama checkpoint.
❌ Actual Behavior
1.The pronunciation not as clear as before SFT.
2.There are some redundant non-text speech gengerated in the among the normal sentences.