Why models perform worse on the 60% identity split LBA dataset than on the 30% split?

Hi,
Thank you for the amazing work!
I am curious about the results in table 8. Why most models (other than GNN) perform dramatically worse in the 60% identity split than in the 30% identity split? Intuitively, the task with 60% split should be easier and achieve better performance as there is more similarity between protein sequences.