Skip to content

Reproducibility of X-codec1 metrics in Table 1 #32

@francissfy

Description

@francissfy

Hello, I am trying to reproduce the objective metrics listed in Table 1 of the Llasa paper. I was able to achieve similar results for models like X-Codec2 and MimiCodec in terms of WER, SPK-SIM, PESQ-WB, and PESQ-NB (though the UT-MOS scores differ slightly). However, for X-Codec1 (nq1 and nq2), my results are significantly different.

For X-Codec1, I used the official implementation from GitHub, specifically the xcodec_hubert_librispeech model, and reconstructed the audio using the provided inference.py script. Could you clarify if there were any additional settings or adjustments used in your experiments?

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions