I'm trying to train the model from scratch on a custom subset of Imagenet, the training works fine on a single gpu, but when running on multiple gpus I get the following error:
Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 3 does not equal 0 (while checking arguments for cudnn_batch_norm)
my configuration file looks like this:
name: train_colorformer
model_type: LABGANRGBModel
scale: 1
num_gpu: 4
manual_seed: 0
queue_size: 64
and I'm using CUDA_VISIBLE_DEVICES to specify the gpus to be used.
I tried looking for any inputs that are not moved to cuda but without success.
I'm trying to train the model from scratch on a custom subset of Imagenet, the training works fine on a single gpu, but when running on multiple gpus I get the following error:
my configuration file looks like this:
name: train_colorformer
model_type: LABGANRGBModel
scale: 1
num_gpu: 4
manual_seed: 0
queue_size: 64
and I'm using CUDA_VISIBLE_DEVICES to specify the gpus to be used.
I tried looking for any inputs that are not moved to cuda but without success.