Multi-GPU training capability for the Pytorch Transformer LM training script - https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/local/pytorchnn/run_nnlm.sh

I used the script ### https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/local/pytorchnn/run_nnlm.sh, but I could not figure out how we could distribute the training of Transformer based LM on multiple GPUs in order to speed-up the Pytorch training. Please suggest if there is any way to do so.

Thanks!