torch.nn.DataParallel with OperatorModule wrapping 'astra_cuda'-RayTransform malfunctioning

There seem to be problems when using the `'astra_cuda'` backend wrapped by a `RayTransform` operator again wrapped by `odl.contrib.torch.operator.OperatorModule` when trying to distribute the model on multiple GPUs using `torch.nn.DataParallel`.
I don't have a specific error message atm, but it lead to some kernel panics on different servers.
My guess would be that it is related to the copying performed by `DataParallel`, which i imagine to result in problems like conflicting shared memory usage or double freeing.
Does someone have more knowledge on why this is happening or how to make it work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

torch.nn.DataParallel with OperatorModule wrapping 'astra_cuda'-RayTransform malfunctioning #1545

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.nn.DataParallel with OperatorModule wrapping 'astra_cuda'-RayTransform malfunctioning #1545

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions