-
-
Notifications
You must be signed in to change notification settings - Fork 59
Description
Hi, thanks for your implementation which already helps me a lot, but I still have several questions:
-
As for the JCU discriminator, author mentioned that they use a convolution module to condition the input mel spectrogram to compute the conditional output (Fig.2), in your codes, mel and transformed waveform are simply cat in temporal dimension with different length (actually 32 times difference). This concatenated result would be improper for later computation? So how do you think of performing this conditional convolution?
-
In your codes, the melgan discriminator outputs are composed of n_layers=3 of feature maps and out_score, but this number of layers in discriminator is 4 if I understood right. So why do you change this layer setting for fm output?
-
VocGAN also mentioned that they consider feature matching loss then give a sum-up loss optimization (Eq.9 in paper). But in your implementation, there seems only exist the conditional and unconditional outputs of each JCU discriminator without such groups of feature maps outputs? So how to get this part for computation of L(FM)?
Looking forward to your reply and so many thanks!