JCU Discriminator implementation details

Hi, thanks for your implementation which already helps me a lot, but I still have several questions:

1. As for the JCU discriminator, author mentioned that they use a convolution module to condition the input mel spectrogram to compute the conditional output (Fig.2), in your codes, mel and transformed waveform are simply cat in temporal dimension with different length (actually 32 times difference). This concatenated result would be improper for later computation? So how do you think of performing this conditional convolution?

2. In your codes, the melgan discriminator outputs are composed of n_layers=3 of feature maps and out_score, but this number of layers in discriminator is 4 if I understood right. So why do you change this layer setting for fm output?

3. VocGAN also mentioned that they consider feature matching loss then give a sum-up loss optimization (Eq.9 in paper). But in your implementation, there seems only exist the conditional and unconditional outputs of each JCU discriminator without such groups of feature maps outputs? So how to get this part for computation of L(FM)?

Looking forward to your reply and so many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

JCU Discriminator implementation details #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

JCU Discriminator implementation details #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions