About Different Size between Predicted-mel and Preprocess-mel

Hi，I am trying combine  “deepvoice3-pytorch”   with "wavenet_vocoder" ,which is both from your work, and I really thanks about that.

And , I extract the mel-output from deepvoice3-pytroch in synthesis.py of it on line 108 ：

with torch.no_grad():
     mel_outputs, linear_outputs, alignments, done = model(
            sequence, text_positions=text_positions, speaker_ids=speaker_ids)
    linear_output = linear_outputs[0].cpu().data.numpy()
    spectrogram = audio._denormalize(linear_output)
    alignment = alignments[0].cpu().data.numpy()
    mel = mel_outputs[0].cpu().data.numpy()
    mel = audio._denormalize(mel)

I save the  mel output in .npy file and  try to use it in wavenet_vocoder. But  I meet the size-mismatch , while the preporcess-mel is (X,80) and can be used for synthesis to wave, but the predicted-mel from deepvoice3 is (80,X) and has size-mismatch error. 

Firstly I  think it maybe the transform problems so I change the predicted-mel(80,X) to .T(X,80) , but It didn't work too.

Could you please tell me why this happens? And  how to modify the size of predicted-mel from deepvoice3  so that it match the input of wavenet?

I'm really want to know about it. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About Different Size between Predicted-mel and Preprocess-mel #205

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About Different Size between Predicted-mel and Preprocess-mel #205

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions