-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
Hello, I am trying to reproduce the objective metrics listed in Table 1 of the Llasa paper. I was able to achieve similar results for models like X-Codec2 and MimiCodec in terms of WER, SPK-SIM, PESQ-WB, and PESQ-NB (though the UT-MOS scores differ slightly). However, for X-Codec1 (nq1 and nq2), my results are significantly different.
For X-Codec1, I used the official implementation from GitHub, specifically the xcodec_hubert_librispeech model, and reconstructed the audio using the provided inference.py script. Could you clarify if there were any additional settings or adjustments used in your experiments?

Metadata
Metadata
Assignees
Labels
No labels