DDP freez becasue of torchsummary.summary #20599
Unanswered
mehran66
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
sharing some learning with Distributed Data Parallel (DDP). my training code was freezing at some point during the training loop and took me so much time to find the issue; at somepoint before the training loop, I was using torchsummary.summary(model, (3, input_size, input_size)) to print the model summary for my log. this function does not support ddp and creates a deadlock.
Beta Was this translation helpful? Give feedback.
All reactions