Issues with voice cloning and emotion tags.

### Checks

- [x] This template is only for usage issues encountered.
- [x] I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
- [x] I have searched for existing issues, including closed ones, and couldn't find a solution.
- [x] I am using English to submit this issue to facilitate community communication.

### Environment Details

Python 3.10.10/3.12
Ubuntu 22.04
omnivoice - 0.1.5


### Steps to Reproduce

while running the same text, even with the default voices 2/10 generations skipped the laughter emotion.
'''
किताब खोलते ही नींद आ जाती है [laughter] 
'''
This same text generated using voice cloning skips the laughter tag 6/10 times. It onlly executes the laughter 4 or less times. The audio reference was an 8 second audio with corresponding text as below

Reference_audio
[monica_lal_8sec.wav](https://github.com/user-attachments/files/28546425/monica_lal_8sec.wav)

Reference_text
'"कूड़े-कबाड़ की भी एक कहानी है। पर यह कहानी सुनाने से पहले एक कहावत दोहरानी पड़ेगी। अंग्रेज़ी में कहते हैं..."

I have tried with different voices as well as reference text. I have also tried with reference audio containing[laughter] in the reference audio and text, and the performance is similar.

The performance on other tags is slightly worse. I have tried with a lot of different texts combinations, but the performance is generally the same.

For this to be fixed do I need to fine-tune the main model more and what kind of data would I need to fix the emotion tag issues, would this need to include the original data also?

Any guidance on this would be appreciated! Thanks in advance.

### ✔️ Expected Behavior

Should execute the laughter consistently across all generations

### ❌ Actual Behavior

Misses laughter in 2/10 generations in case of default voices and 6/10 times in case of voice cloning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with voice cloning and emotion tags. #178

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issues with voice cloning and emotion tags. #178

Description

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions