Checks
Question details
Question
I'm trying to use voice cloning with a female reference speaker, but the generated voice sounds noticeably male.
I'm not sure whether this is expected behavior, a limitation of the model, or an issue with my usage.
Request Parameters
{
"text": "是的,是的,是的。但我觉得这有点小问题。",
"ref_audio": "/app/voice/audio/zh/female_teenager_low_pitch.wav",
"ref_text": "你好呀,有什么我可以帮你的吗?",
"num_step": 32,
"speed": 1.2,
"instruct": "female, teenager, low pitch",
"language": "zh"
}
Expected Behavior
The generated voice should sound similar to the female reference speaker.
Actual Behavior
The generated voice sounds more like a male speaker, even though:
The reference audio is female.
The speaker instruction specifies female.
The generated content is in the same language as the reference.
Additional Information
I can provide:
The reference audio used for cloning.
female_teenager_low_pitch.wav
The generated audio result.
3ccbed22d8b17392572439d011a5be76.mp3
Could you please help me understand:
Is gender information from instruct expected to strongly affect the output?
Are there recommended settings or prompt formats to improve gender consistency during voice cloning?
Thank you for your help.
Checks
Question details
Question
I'm trying to use voice cloning with a female reference speaker, but the generated voice sounds noticeably male.
I'm not sure whether this is expected behavior, a limitation of the model, or an issue with my usage.
Request Parameters
{ "text": "是的,是的,是的。但我觉得这有点小问题。", "ref_audio": "/app/voice/audio/zh/female_teenager_low_pitch.wav", "ref_text": "你好呀,有什么我可以帮你的吗?", "num_step": 32, "speed": 1.2, "instruct": "female, teenager, low pitch", "language": "zh" }Expected Behavior
The generated voice should sound similar to the female reference speaker.
Actual Behavior
The generated voice sounds more like a male speaker, even though:
The reference audio is female.
The speaker instruction specifies female.
The generated content is in the same language as the reference.
Additional Information
I can provide:
The reference audio used for cloning.
female_teenager_low_pitch.wav
The generated audio result.
3ccbed22d8b17392572439d011a5be76.mp3
Could you please help me understand:
Is gender information from instruct expected to strongly affect the output?
Are there recommended settings or prompt formats to improve gender consistency during voice cloning?
Thank you for your help.