RuntimeError when training on multiple GPUs

I'm trying to train the model from scratch on a custom subset of Imagenet, the training works fine on a single gpu, but when running on multiple gpus I get the following error:

> _Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 3 does not equal 0 (while checking arguments for cudnn_batch_norm)_


my configuration file looks like this:

name: train_colorformer
model_type: LABGANRGBModel
scale: 1
num_gpu: 4
manual_seed: 0
queue_size: 64

and I'm using CUDA_VISIBLE_DEVICES to specify the gpus to be used.
I tried looking for any inputs that are not moved to cuda but without success.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError when training on multiple GPUs #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError when training on multiple GPUs #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions