Hi, thanks for your sharing.
When I tried to use multi-gpu to train Knowledge Distillation:
python3 -m torch.distributed.run --nproc_per_node $N_GPU distillation.py ...
I got the error:
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
distillation.py FAILED
Failures:
<NO_OTHER_FAILURES>