Voice cloning outputs male voice despite female reference audio and female speaker instruction

### Checks

- [x] This template is only for research question, not usage problems, feature requests or bug reports.
- [x] I have thoroughly reviewed the project documentation and read the related paper(s).
- [x] I have searched for existing issues, including closed ones, no similar questions.
- [x] I am using English to submit this issue to facilitate community communication.

### Question details

## Question

I'm trying to use voice cloning with a female reference speaker, but the generated voice sounds noticeably male.

I'm not sure whether this is expected behavior, a limitation of the model, or an issue with my usage.

## Request Parameters

```python
{
    "text": "是的，是的，是的。但我觉得这有点小问题。",
    "ref_audio": "/app/voice/audio/zh/female_teenager_low_pitch.wav",
    "ref_text": "你好呀，有什么我可以帮你的吗？",
    "num_step": 32,
    "speed": 1.2,
    "instruct": "female, teenager, low pitch",
    "language": "zh"
}
```

## Expected Behavior

The generated voice should sound similar to the female reference speaker.

## Actual Behavior

The generated voice sounds more like a male speaker, even though:

The reference audio is female.
The speaker instruction specifies female.
The generated content is in the same language as the reference.
Additional Information

### I can provide:

The reference audio used for cloning.

[female_teenager_low_pitch.wav](https://github.com/user-attachments/files/28499383/female_teenager_low_pitch.wav)

The generated audio result.

[3ccbed22d8b17392572439d011a5be76.mp3](https://github.com/user-attachments/files/28499404/3ccbed22d8b17392572439d011a5be76.mp3)

### Could you please help me understand:

Is gender information from instruct expected to strongly affect the output?
Are there recommended settings or prompt formats to improve gender consistency during voice cloning?

Thank you for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice cloning outputs male voice despite female reference audio and female speaker instruction #176

Checks

Question details

Question

Request Parameters

Expected Behavior

Actual Behavior

I can provide:

Could you please help me understand:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Voice cloning outputs male voice despite female reference audio and female speaker instruction #176

Description

Checks

Question details

Question

Request Parameters

Expected Behavior

Actual Behavior

I can provide:

Could you please help me understand:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions