Training wavenet to rap?

So I heard tacotron 2 needs very little data 100-300 sentences for good sounding speech. However, it has bad tempo shit. I've seen wavenet can be curated for music and I wondered if the model can be conditioned to tts with rhythm. Even if it it is possible (hopefully), I have heard it requires large amounts of data in the 10's of GB's. Can wavenet can be trained with only 1-2 GB maybe no more than 4GB to get good results? And if it can, how does one prepare a dataset (like condition it)? So I chop audio or spit it in to each line the rapper spoke or give the full acapella? Do I use one wave file or multiple (oh what audio format and number of channels and sample rate)? Sorry, I am extremely new. Any help would be appreciated. Thanks.

Flavius Valerius Constantinus, The Last Roman Emperor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training wavenet to rap? #410

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training wavenet to rap? #410

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions