Seeking Guidance on Reproduction

Thank you very much for open-sourcing this excellent work!

We encountered some discrepancies while attempting to reproduce your results and would appreciate your guidance. Following the **default configuration and code** provided in the repository, we conducted complete training (10 epochs) on an 8-GPU machine, but obtained a PLCC of around 0.8 on the KonIQ test set.

To verify our evaluation pipeline, we first tested using your released weights and achieved a PLCC of approximately 0.93, which aligns with the performance reported in the paper. **This suggests that our evaluation code should be correct.**

When analyzing the training process, we noticed a **potential difference**: during our reproduction, the **format reward** eventually converged around 0.5, while the trainer_state.json corresponding to your released weights shows that the format reward quickly converged to 1.0 by step 29. We are uncertain whether this discrepancy directly contributes to the final performance gap.

We've included our training curves for reference. If there are any training details or configuration adjustments we might have overlooked, we would be very grateful for your guidance. Thank you!

<img width="662" height="452" alt="Image" src="https://github.com/user-attachments/assets/e059f982-d3b1-40f9-aab1-d05452ad4c2b" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking Guidance on Reproduction #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seeking Guidance on Reproduction #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions