Replies: 8 comments 17 replies
-
|
We were also frustrated when we realized that Emilia is restricted with CC-BY-NC. And it would be very welcome to share dataset that we might not notice before |
Beta Was this translation helpful? Give feedback.
-
|
I can imagine your frustation. The Share Alike I don't think it's a problem, in the end small business / studios like us usually use the model and the system, and the closed part is a spearate part of the code. I'm not sure about the possible implications, but I don't think it could be problematic. The GPL-V3, since the communication usually goes throught disk or through sockets, I don't think it's problematic either, mainly because your code is MIT, and the GPL affects the model, which will be used to generate an audio file, that in the end will be a file written in a disk, and then the GPL ends there, so unless we do some direct memory communication, the GPL should not affect any commercial development in this case. I will add what everyone adds in this situations, I'm not a lawyer, so I cannot be 100% sure, however I've been working with Blender for a long time and I investigated the GPL license quite a lot, in this case I think it won't be problematic for anyone and will ensure the model and its derivatives are kept open :) One of the main things would be to keep the project commercially usable, that will enable small projects to be able to fight in a market where Eleven Labs is the biggest one and has a TON of resources, while we don't :) |
Beta Was this translation helpful? Give feedback.
-
|
As the main source of this regarding the licence and the Emilia dataset. I think it is worth pointing out the the team behind it (who released the dataset itself) https://huggingface.co/datasets/amphion/Emilia don't seem to feel training on the dataset requires the models themselves also be Dataset: https://huggingface.co/datasets/amphion/Emilia Given they are the one's who compiled the dataset and licensed it as Even if in the end that is judged to be different then the publicly listed licence, as the ones who released the dataset with that licence in the first place they have the ability to ok a different usage. And in terms of training on "in the wild" content as a whole. Whisper is incredibly widely used, MIT licensed, and trained on in the wild audio from across the web. Llama and all open source LLMs are similarly trained on data scrapped from public web pages. |
Beta Was this translation helpful? Give feedback.
-
Regarding open data sources: they all seem to be much smaller. With 1 exception released this month. The fully open ones I could find are: Public Domain - Use it however you want
It's unlabelled audio data, but the US library of congress website has a collection of public domain audio recordings, filterable to "vocal": https://www.loc.gov/collections/national-jukebox/?dates=1800/1922&fa=subject:vocal cc-by-4.0 / MIT / BSD - Say you used it:
cc-by-sa-4.0 - Copyleft LicenceNot as freely available.
FBK-MT/moselThe big stand out there is FBK-MT/mosel which was released within this month to solve this problem. It is a collection of either public domain or Creative Commons BY 3 or 4.0 audio data sets. From the paper it seems they filled in a lot of the extra hours using ASR on the Creative Commons audio datasets they found. Notably Emilia states
MOSEL states Dataset Statistics (in hours)
Their intro/abstract to their paper makes it clear this sort of problem was exactly what they set out to solve, as they complain about the licences of other data sets not being really open several times in the opening section. |
Beta Was this translation helpful? Give feedback.
-
|
Which training datasets are currently eligible for retraining? Will these then support European languages? |
Beta Was this translation helpful? Give feedback.
-
|
I am new to AI TTS overall and F5. I have looked through the license and I am a little confused on where the lines are drawn. I do some content creation (just a side thing) and wanted to use AI to make a voice other than my own for some shorts, announcements and things like that. I used one of the StyleTTS2 (Pinokio install) voices to generate about 25 seconds of voice. Its OK but I really like what I see F5 doing for multi-speech, emotional response and more natural sounding voices. If I use F5 to generate that 25 seconds on a monetized video does that break the NC part of the rule? When I read the license it sounded like if I am using F5 or the models/dataset directly as part of what I am doing that is breaking the rule. Was not sure about what is produced as output from those models is breaking the rule as well. In addition I have been trying to find "training prompts" for the multi-speech to provide some of that more "emotional content". The best I have found is at https://www.microsoft.com/en-us/research/project/e2-tts/#:~:text=Changing%20the%20speech%20rate in the RAVDESS section. It has multiple male and female prompts that the site says is for demo purposes. If I download those and use them for my "training prompts" to produce the output does that break any rules as well? If this is a problem are there places to get some of those "emotional training prompts" for the multi-speech with different voices (male, female, younger, older, with accents, etc.) that are open source. I really like what I see I just want to make sure I am not breaking any of the rules and can do it for free (for now anyway). |
Beta Was this translation helpful? Give feedback.
-
|
@SWivid You mentioned here you are also planning an CC-BY models. Is this coming sometime in near future? Would be really helpful for us |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for reaching out and asking about this.
I did not end up using F5 at this point. License issue aside I was having problems getting the results I was hoping for. I am new to the AI world so I suspect I was not understanding things and need to learn a little more on how everything works. I will probably revisit this in the future. Just not sure when at this point
Scott Harris
(214) 938-0337
…________________________________
From: Ankit Bansal ***@***.***>
Sent: Tuesday, February 4, 2025 2:01 PM
To: SWivid/F5-TTS ***@***.***>
Cc: ScottHarris24 ***@***.***>; Comment ***@***.***>
Subject: Re: [SWivid/F5-TTS] Model license concern, dataset sharing, language support, etc. (Discussion #129)
@SWivid<https://github.com/SWivid> You mentioned here you are also planning an CC-BY models. Is this coming sometime in near future? Would be really helpful for us
—
Reply to this email directly, view it on GitHub<#129 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BNPXMB5Z2WU6VPWLQXCXZL32OEMAZAVCNFSM6AAAAABQB44LYGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMBVHEYTSOA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This license change is a pity, the CC-BY is not a problem, but the NC makes this to be in the same boat as Fish Speech, basically we are back to the starting point, which is a pity.
We don't have the resources to train a new model by ourselves, that's why we are looking for open source projects, since we are very small we have to rely on this.
Is there a possibility that you retrain a model with a non so restrictive license, using a non-restrictive data set?
There is a multilingual data set from facebook under the CC-BY 4 license that wouldn't be so restrictive, and it support several langauges:
https://huggingface.co/datasets/facebook/multilingual_librispeech
There is also this one:
https://huggingface.co/datasets/ylacombe/cml-tts
In case it's useful for a future training to avoid the licensing problem.
In any case thanks for your work!
Beta Was this translation helpful? Give feedback.
All reactions