Skip to content

GenEval score discrepancy on VGT (Qwen2.5VL): 0.80 vs reported 0.83 #11

@DingShizhe

Description

@DingShizhe

I tested VGT (Qwen2.5VL) on GenEval using the standard generation settings (cfg_scale = 4.5, num_steps = 30, acc_ratio = 1, seed = 42). On AMD MI325x with ROCm 6.2, the best score I can get is around 0.80, and I’m not able to reproduce the reported 0.83.

I’m wondering whether the 0.83 result can be reproduced on NVIDIA GPUs (CUDA) with the same settings, or if it requires some special setup such as a particular seed, multi-seed averaging, or other generation/inference tweaks. Also curious whether there is any known difference between ROCm and CUDA that could affect this result.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions