A truely multilingual with even better cloning capability project #981
Skylighter18
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am planning to train a huge multilingual version of the F5 TTS on over 50 languages (100+ hours each with atleast 20 speaker data in each language).
IndicF5 people have done something similar with 11 languages, and probably the fact it has lot of languages and speakers into the mix it works great in terms of cloning and doesn't bring accent even if cross-lingual cloning.
They have not shared details about the model's training times or if any challenges faced in making the model converge.
What will be the possible challenges? is 20 speakers good enough per language or should I look for even more variety(any benchmarks you can share?). In terms of training times or compute required will it need higher(the net data however is small as compared to the main model training so should be relatively cheap?)
Another major enhancement i am planning is to allow longer reference audio(planning training in similar fashion too) as it should enhance the cloning quality, and might help with emotion and correct pauses capture too. Will this be possible? or I am aspiring too high?
@SWivid pls share your thoughts on the same, and I urge you to closely look at their work and find possible variations they might have done. I experimented with changing the keys of the model file, which did resolve the error however the output generated cannot be comprehended by humans(I feel it is masked in some way, as the file size, duration, etc. seems to be right when experimented with multiple text files. Let's connect to discuss this in more detail..
Beta Was this translation helpful? Give feedback.
All reactions