Skip to content

The measured metrics for Qwen-Image, evaluated using gpt-4o, show significant discrepancies. #12

@zzzandyx

Description

@zzzandyx

Hello, I retested the Qwen-Image metrics according to your method and found a significant discrepancy. What could be the reason? Model Cultural Time Space Biology Physics Chemistry Overall Qwen 0.62 0.63 0.77 0.57 0.75 0.4 0.62 Qwen 0.56 0.57 0.65 0.48 0.55 0.32 0.54 The metrics above are those you tested; the metrics below are those I actually tested. I'm using the internal API interface for testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions