Great work, but I think I've encountered some issues. I tried using the torchrun command to train DIT across multiple machines, but it seems unable to correctly detect the rank. Could you please provide the appropriate multi-machine launch command that the code is compatible with?