Thank you very much for open-sourcing this excellent work!
We encountered some discrepancies while attempting to reproduce your results and would appreciate your guidance. Following the default configuration and code provided in the repository, we conducted complete training (10 epochs) on an 8-GPU machine, but obtained a PLCC of around 0.8 on the KonIQ test set.
To verify our evaluation pipeline, we first tested using your released weights and achieved a PLCC of approximately 0.93, which aligns with the performance reported in the paper. This suggests that our evaluation code should be correct.
When analyzing the training process, we noticed a potential difference: during our reproduction, the format reward eventually converged around 0.5, while the trainer_state.json corresponding to your released weights shows that the format reward quickly converged to 1.0 by step 29. We are uncertain whether this discrepancy directly contributes to the final performance gap.
We've included our training curves for reference. If there are any training details or configuration adjustments we might have overlooked, we would be very grateful for your guidance. Thank you!
