becomes many noisy or irregular sounds,  how to resolve it?

### Self Checks

- [x] This template is only for bug reports. For questions, please visit [Discussions](https://github.com/fishaudio/fish-speech/discussions).
- [x] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. [English](https://speech.fish.audio/) [中文](https://speech.fish.audio/zh/) [日本語](https://speech.fish.audio/ja/) [Portuguese (Brazil)](https://speech.fish.audio/pt/)
- [x] I have searched for existing issues, including closed ones. [Search issues](https://github.com/fishaudio/fish-speech/issues)
- [x] I confirm that I am using English to submit this report (我已阅读并同意 [Language Policy](https://github.com/fishaudio/fish-speech/issues/515)).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
- [x] Please do not modify this template and fill in all required fields.

### Cloud or Self Hosted

Self Hosted (Source)

### Environment Details

WSL2 on Windows 11
python=3.10
64G RAM
RTX 4000 SFF ADA 20G VRAM

### Steps to Reproduce

python -m tools.run_webui \
    --llama-checkpoint-path "checkpoints/fish-speech-1.5" \
    --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
    --decoder-config-name firefly_gan_vq --compile

use these text :

Fish Agent V0.1 3B is a groundbreaking Voice-to-Voice model capable of capturing and generating environmental audio information with unprecedented accuracy. What sets it apart is its semantic-token-free architecture, eliminating the need for traditional semantic encoders/decoders like Whisper and CosyVoice.

Additionally, it stands as a state-of-the-art text-to-speech (TTS) model, trained on an extensive dataset of 700,000 hours of multilingual audio content.

This model is a continue-pretrained version of Qwen-2.5-3B-Instruct for 200B voice & text tokens.

Supported Languages
The model supports the following languages with their respective training data sizes:

English (en): ~300,000 hours
Chinese (zh): ~300,000 hours
German (de): ~20,000 hours
Japanese (ja): ~20,000 hours
French (fr): ~20,000 hours
Spanish (es): ~20,000 hours
Korean (ko): ~20,000 hours
Arabic (ar): ~20,000 hours
For detailed information and implementation guidelines, please visit our [Fish Speech GitHub repository](https://github.com/fishaudio/fish-speech).


### ✔️ Expected Behavior

Generate smooth speech

### ❌ Actual Behavior

when reading this part:
English (en): ~300,000 hours
Chinese (zh): ~300,000 hours
German (de): ~20,000 hours
Japanese (ja): ~20,000 hours
French (fr): ~20,000 hours
Spanish (es): ~20,000 hours
Korean (ko): ~20,000 hours
Arabic (ar): ~20,000 hours

 it becomes many noisy or irregular sounds,  how to resolve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

becomes many noisy or irregular sounds, how to resolve it? #944

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

becomes many noisy or irregular sounds, how to resolve it? #944

Description

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions