DDP freez becasue of torchsummary.summary #20599

mehran66 · 2025-02-20T20:07:05Z

mehran66
Feb 20, 2025

sharing some learning with Distributed Data Parallel (DDP). my training code was freezing at some point during the training loop and took me so much time to find the issue; at somepoint before the training loop, I was using torchsummary.summary(model, (3, input_size, input_size)) to print the model summary for my log. this function does not support ddp and creates a deadlock.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP freez becasue of torchsummary.summary #20599

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

DDP freez becasue of torchsummary.summary #20599

Uh oh!

mehran66 Feb 20, 2025

Replies: 0 comments

mehran66
Feb 20, 2025