Replies: 2 comments 5 replies
-
|
Mainly because English words are broken into characters (letters) rather than go through a g2p first. |
Beta Was this translation helpful? Give feedback.
-
|
@SWivid You mention this is likely a G2P problem. So Chinese which already uses pinyin should be OK? I don't quite understand why more data wouldn't solve this. Since the model is deep enough, shouldn't it be able to construct a G2P-like representation internally? I worry that by using an external G2P system, the problem will just be moved from the model to the G2P system (for example, if the G2P system cannot distinguish between sake (the alcohol drink) and sake ("for goodness' sake")). One more question =) I'm thinking of Japanese and Korean too. If I just use their 'alphabet' (for example, Hiragana and Hangul), I will likely run into the same problems that English face right? So it's better to go through G2P for everything? Or does the long term fix involve something else? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using something along the lines of the suggested workflow in interference-cli.py, as shown here. I've trained it on my own voice clip and then generate speech with the text of the "the hare and the tortoise" fable. "Tortoise" is consistently pronounced wrong, as you can hear in this audio clip. How would I go about figuring out the cause of this and improving the model? It would seem surprising if the training data didn't include that word. Obviously, I'm not going to try to fix individual pronunciations one by one, but I'm trying to understand the underlying causes and how to approach them.
Maybe this is just a matter of learning to use the fine tuning cli? But I would appreciate a succinct step-by-step example if so. Is there an easy way to use datasets other than Emilia? Maybe one trained with librivox? It's not clear to me quite how you would go about importing and generating from a different dataset.
Beta Was this translation helpful? Give feedback.
All reactions