Skip to content

JCU Discriminator implementation details #26

@twinklecc

Description

@twinklecc

Hi, thanks for your implementation which already helps me a lot, but I still have several questions:

  1. As for the JCU discriminator, author mentioned that they use a convolution module to condition the input mel spectrogram to compute the conditional output (Fig.2), in your codes, mel and transformed waveform are simply cat in temporal dimension with different length (actually 32 times difference). This concatenated result would be improper for later computation? So how do you think of performing this conditional convolution?

  2. In your codes, the melgan discriminator outputs are composed of n_layers=3 of feature maps and out_score, but this number of layers in discriminator is 4 if I understood right. So why do you change this layer setting for fm output?

  3. VocGAN also mentioned that they consider feature matching loss then give a sum-up loss optimization (Eq.9 in paper). But in your implementation, there seems only exist the conditional and unconditional outputs of each JCU discriminator without such groups of feature maps outputs? So how to get this part for computation of L(FM)?

Looking forward to your reply and so many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions