Open
Description
There seem to be problems when using the 'astra_cuda'
backend wrapped by a RayTransform
operator again wrapped by odl.contrib.torch.operator.OperatorModule
when trying to distribute the model on multiple GPUs using torch.nn.DataParallel
.
I don't have a specific error message atm, but it lead to some kernel panics on different servers.
My guess would be that it is related to the copying performed by DataParallel
, which i imagine to result in problems like conflicting shared memory usage or double freeing.
Does someone have more knowledge on why this is happening or how to make it work?