confusing things in CLAPScore evaluation

I have try to use the CLAPScore to eval my own model after generation.
however, I find two confusing things.
1. CLAPScore is different, even if I maintain the generated audio unchanged. Where did the random factor come from? How can I get a fixed test result.
2. My CLAPScore is higher (better) than the GROUND TRUTH. Although I use more data to train this model, is it possible?