I tested VGT (Qwen2.5VL) on GenEval using the standard generation settings (cfg_scale = 4.5, num_steps = 30, acc_ratio = 1, seed = 42). On AMD MI325x with ROCm 6.2, the best score I can get is around 0.80, and I’m not able to reproduce the reported 0.83.
I’m wondering whether the 0.83 result can be reproduced on NVIDIA GPUs (CUDA) with the same settings, or if it requires some special setup such as a particular seed, multi-seed averaging, or other generation/inference tweaks. Also curious whether there is any known difference between ROCm and CUDA that could affect this result.
Thanks!