Skip to content

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1 #735

@anurag-198

Description

@anurag-198

Thanks for the benchmarks,

I am trying to use multiple GPUs for evaluation with batch size 1 with following command - (Please note that I have long sequence and need to evaluate it in multiple GPUs)

CUDA_VISIBLE_DEVICES=0,1 accelerate launch --num_processes=1 -m lmms_eval --model internvl2 --model_args pretrained='OpenGVLab/InternVL2-8B,device_map=auto,trust_remote_code=True' --tasks ours --batch_size 1 --log_samples --log_samples_suffix reproduce --output_path ./logs/

I can verify that model is sharded well in the two GPUs, verified with lm.model.hf_device_map, output is -

{'language_model.model.layers.0': 0, 'language_model.model.layers.1': 0, 'language_model.model.layers.2': 0, 'language_model.model.layers.3': 0, 'language_model.model.layers.4': 0, 'language_model.model.layers.5': 0, 'language_model.model.layers.6': 0, 'language_model.model.layers.7': 0, 'language_model.model.layers.8': 0, 'language_model.model.layers.9': 0, 'language_model.model.layers.10': 0, 'language_model.model.layers.11': 1, 'language_model.model.layers.12': 1, 'language_model.model.layers.13': 1, 'language_model.model.layers.14': 1, 'language_model.model.layers.15': 1, 'language_model.model.layers.16': 1, 'language_model.model.layers.17': 1, 'language_model.model.layers.18': 1, 'language_model.model.layers.19': 1, 'language_model.model.layers.20': 1, 'language_model.model.layers.21': 1, 'language_model.model.layers.22': 1, 'language_model.model.layers.23': 1, 'language_model.model.layers.24': 1, 'language_model.model.layers.25': 1, 'language_model.model.layers.26': 1, 'language_model.model.layers.27': 1, 'language_model.model.layers.28': 1, 'language_model.model.layers.29': 1, 'language_model.model.layers.30': 1, 'language_model.model.layers.31': 0, 'language_model.model.layers.32': 1, 'vision_model': 0, 'mlp1': 0, 'language_model.model.tok_embeddings': 0, 'language_model.model.embed_tokens': 0, 'language_model.output': 0, 'language_model.model.norm': 0, 'language_model.lm_head': 0}

However, throws error as: Error during evaluation: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!. Please set --verbosity=DEBUG to get more information

This happens when model outputs from layer 10 are in cuda:0 whereas model wieghts of layer 11 are in cuda:1.

I tried to use dispatch_model() from hugging_face for solving this, however it also doesnt work.

Any help would be highly appreciated.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions