Skip to content

Training wavenet to rap? #410

Open
Open
@constantinethegr8

Description

@constantinethegr8

So I heard tacotron 2 needs very little data 100-300 sentences for good sounding speech. However, it has bad tempo shit. I've seen wavenet can be curated for music and I wondered if the model can be conditioned to tts with rhythm. Even if it it is possible (hopefully), I have heard it requires large amounts of data in the 10's of GB's. Can wavenet can be trained with only 1-2 GB maybe no more than 4GB to get good results? And if it can, how does one prepare a dataset (like condition it)? So I chop audio or spit it in to each line the rapper spoke or give the full acapella? Do I use one wave file or multiple (oh what audio format and number of channels and sample rate)? Sorry, I am extremely new. Any help would be appreciated. Thanks.

Flavius Valerius Constantinus, The Last Roman Emperor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions