I have try to use the CLAPScore to eval my own model after generation.
however, I find two confusing things.
- CLAPScore is different, even if I maintain the generated audio unchanged. Where did the random factor come from? How can I get a fixed test result.
- My CLAPScore is higher (better) than the GROUND TRUTH. Although I use more data to train this model, is it possible?