I'm wondering how to get the score of these evaluation benchmarks, can you open source it? Thank you very much!