The measured metrics for Qwen-Image, evaluated using gpt-4o, show significant discrepancies.

Hello, I retested the Qwen-Image metrics according to your method and found a significant discrepancy. What could be the reason?                                           Model	Cultural	Time	Space	Biology	Physics	Chemistry	Overall                                                                                                                                                 Qwen      0.62	        0.63	         0.77	0.57	          0.75	0.4	                  0.62                                                                                                                                       Qwen      0.56	        0.57	         0.65	 0.48	0.55	         0.32	           0.54                                                                                                                  The metrics above are those you tested; the metrics below are those I actually tested.  I'm using the internal API interface for testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The measured metrics for Qwen-Image, evaluated using gpt-4o, show significant discrepancies. #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The measured metrics for Qwen-Image, evaluated using gpt-4o, show significant discrepancies. #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions