Skip to content

About Different Size between Predicted-mel and Preprocess-mel #205

@ymzlygw

Description

@ymzlygw

Hi,I am trying combine “deepvoice3-pytorch” with "wavenet_vocoder" ,which is both from your work, and I really thanks about that.

And , I extract the mel-output from deepvoice3-pytroch in synthesis.py of it on line 108 :

with torch.no_grad():
mel_outputs, linear_outputs, alignments, done = model(
sequence, text_positions=text_positions, speaker_ids=speaker_ids)
linear_output = linear_outputs[0].cpu().data.numpy()
spectrogram = audio._denormalize(linear_output)
alignment = alignments[0].cpu().data.numpy()
mel = mel_outputs[0].cpu().data.numpy()
mel = audio._denormalize(mel)

I save the mel output in .npy file and try to use it in wavenet_vocoder. But I meet the size-mismatch , while the preporcess-mel is (X,80) and can be used for synthesis to wave, but the predicted-mel from deepvoice3 is (80,X) and has size-mismatch error.

Firstly I think it maybe the transform problems so I change the predicted-mel(80,X) to .T(X,80) , but It didn't work too.

Could you please tell me why this happens? And how to modify the size of predicted-mel from deepvoice3 so that it match the input of wavenet?

I'm really want to know about it. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions