GenEval score discrepancy on VGT (Qwen2.5VL): 0.80 vs reported 0.83

I tested **VGT (Qwen2.5VL)** on **GenEval** using the standard generation settings (cfg_scale = 4.5, num_steps = 30, acc_ratio = 1, seed = 42). On **AMD MI325x with ROCm 6.2**, the best score I can get is around **0.80**, and I’m not able to reproduce the reported **0.83**.

I’m wondering whether the 0.83 result can be reproduced on **NVIDIA GPUs (CUDA)** with the same settings, or if it requires some special setup such as a particular seed, multi-seed averaging, or other generation/inference tweaks. Also curious whether there is any known difference between ROCm and CUDA that could affect this result.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GenEval score discrepancy on VGT (Qwen2.5VL): 0.80 vs reported 0.83 #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GenEval score discrepancy on VGT (Qwen2.5VL): 0.80 vs reported 0.83 #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions