8*A100 Is the training speed normal? {"step": 22, "epoch": 1, "loss": 0.08392333984375, "grad_norm": 0.3900793790817261, "lr": 0.0001, "step_time": 2.9826059341430664}
8*A100 Is the training speed normal? {"step": 22, "epoch": 1, "loss": 0.08392333984375, "grad_norm": 0.3900793790817261, "lr": 0.0001, "step_time": 2.9826059341430664}