I am using Torch 2.8 and CUDA 12.9 basically every library i use is exactly same on my Windows and Linux machine
However, on Linux with Adafactor Qwen Image training is almost 2x faster
Other optimizers seems like not affected like this
I mean for example best Windows speed is around 9.5 second / it meanwhile Linux is 5.9 second / it
Exactly same settings
I even installed Ubuntu 22 WSL and same here screenshot
It simply can't use GPU watt for some reason
I test Linux on RunPod and their machine is slower than mine
