The score in the paper is confusing

Thank you very much for your work. However, I have a few minor questions.

Firstly, the fine-tuned base model in your repository is **Show-o-512x512**, which is supposed to have a score of **0.68** on GenEval, but why the baseline score you listed is **0.53**? 

Additionally, there appears to be an inconsistency in your multi-model understanding scores. You directly used Show-o's scores, but for POPE metrics you employed the VQGAN-based version of Show-o, while other scores were derived from the CLIP-based version of Show-o. This discrepancy is quite confusing.

Could you please open-source the evaluation code to help verify these results?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The score in the paper is confusing #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The score in the paper is confusing #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions