Hello,
Suppose I am training a model with global_batch_size = 16, micro_batch_size= 1 and 8 GPUs (dp=8) and save a checkpoint for iteration 50 and stop
Then I resume training from this checkpoint with 16 GPUS (dp=16)
I wonder is this resume training completely equivalent to the previous training from 50 iterations (considering the rng state or something else)?