Reproducibility of X-codec1 metrics in Table 1

Hello, I am trying to reproduce the objective metrics listed in Table 1 of the Llasa paper. I was able to achieve similar results for models like X-Codec2 and MimiCodec in terms of WER, SPK-SIM, PESQ-WB, and PESQ-NB (though the UT-MOS scores differ slightly). However, for X-Codec1 (nq1 and nq2), my results are significantly different.

For X-Codec1, I used the official implementation from [GitHub](https://github.com/zhenye234/xcodec?tab=readme-ov-file), specifically the xcodec_hubert_librispeech model, and reconstructed the audio using the provided inference.py script. Could you clarify if there were any additional settings or adjustments used in your experiments?

<img width="909" height="441" alt="Image" src="https://github.com/user-attachments/assets/c65dd534-da08-49cc-ace2-09ef5d488d06" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducibility of X-codec1 metrics in Table 1 #32

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducibility of X-codec1 metrics in Table 1 #32

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions