Thanks for the benchmarks,
I am trying to use multiple GPUs for evaluation with batch size 1 with following command - (Please note that I have long sequence and need to evaluate it in multiple GPUs)
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --num_processes=1 -m lmms_eval --model internvl2 --model_args pretrained='OpenGVLab/InternVL2-8B,device_map=auto,trust_remote_code=True' --tasks ours --batch_size 1 --log_samples --log_samples_suffix reproduce --output_path ./logs/
I can verify that model is sharded well in the two GPUs, verified with lm.model.hf_device_map, output is -
{'language_model.model.layers.0': 0, 'language_model.model.layers.1': 0, 'language_model.model.layers.2': 0, 'language_model.model.layers.3': 0, 'language_model.model.layers.4': 0, 'language_model.model.layers.5': 0, 'language_model.model.layers.6': 0, 'language_model.model.layers.7': 0, 'language_model.model.layers.8': 0, 'language_model.model.layers.9': 0, 'language_model.model.layers.10': 0, 'language_model.model.layers.11': 1, 'language_model.model.layers.12': 1, 'language_model.model.layers.13': 1, 'language_model.model.layers.14': 1, 'language_model.model.layers.15': 1, 'language_model.model.layers.16': 1, 'language_model.model.layers.17': 1, 'language_model.model.layers.18': 1, 'language_model.model.layers.19': 1, 'language_model.model.layers.20': 1, 'language_model.model.layers.21': 1, 'language_model.model.layers.22': 1, 'language_model.model.layers.23': 1, 'language_model.model.layers.24': 1, 'language_model.model.layers.25': 1, 'language_model.model.layers.26': 1, 'language_model.model.layers.27': 1, 'language_model.model.layers.28': 1, 'language_model.model.layers.29': 1, 'language_model.model.layers.30': 1, 'language_model.model.layers.31': 0, 'language_model.model.layers.32': 1, 'vision_model': 0, 'mlp1': 0, 'language_model.model.tok_embeddings': 0, 'language_model.model.embed_tokens': 0, 'language_model.output': 0, 'language_model.model.norm': 0, 'language_model.lm_head': 0}
However, throws error as: Error during evaluation: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!. Please set --verbosity=DEBUG to get more information
This happens when model outputs from layer 10 are in cuda:0 whereas model wieghts of layer 11 are in cuda:1.
I tried to use dispatch_model() from hugging_face for solving this, however it also doesnt work.
Any help would be highly appreciated.
Thanks for the benchmarks,
I am trying to use multiple GPUs for evaluation with batch size 1 with following command - (Please note that I have long sequence and need to evaluate it in multiple GPUs)
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --num_processes=1 -m lmms_eval --model internvl2 --model_args pretrained='OpenGVLab/InternVL2-8B,device_map=auto,trust_remote_code=True' --tasks ours --batch_size 1 --log_samples --log_samples_suffix reproduce --output_path ./logs/
I can verify that model is sharded well in the two GPUs, verified with lm.model.hf_device_map, output is -
{'language_model.model.layers.0': 0, 'language_model.model.layers.1': 0, 'language_model.model.layers.2': 0, 'language_model.model.layers.3': 0, 'language_model.model.layers.4': 0, 'language_model.model.layers.5': 0, 'language_model.model.layers.6': 0, 'language_model.model.layers.7': 0, 'language_model.model.layers.8': 0, 'language_model.model.layers.9': 0, 'language_model.model.layers.10': 0, 'language_model.model.layers.11': 1, 'language_model.model.layers.12': 1, 'language_model.model.layers.13': 1, 'language_model.model.layers.14': 1, 'language_model.model.layers.15': 1, 'language_model.model.layers.16': 1, 'language_model.model.layers.17': 1, 'language_model.model.layers.18': 1, 'language_model.model.layers.19': 1, 'language_model.model.layers.20': 1, 'language_model.model.layers.21': 1, 'language_model.model.layers.22': 1, 'language_model.model.layers.23': 1, 'language_model.model.layers.24': 1, 'language_model.model.layers.25': 1, 'language_model.model.layers.26': 1, 'language_model.model.layers.27': 1, 'language_model.model.layers.28': 1, 'language_model.model.layers.29': 1, 'language_model.model.layers.30': 1, 'language_model.model.layers.31': 0, 'language_model.model.layers.32': 1, 'vision_model': 0, 'mlp1': 0, 'language_model.model.tok_embeddings': 0, 'language_model.model.embed_tokens': 0, 'language_model.output': 0, 'language_model.model.norm': 0, 'language_model.lm_head': 0}
However, throws error as: Error during evaluation: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!. Please set
--verbosity=DEBUGto get more informationThis happens when model outputs from layer 10 are in cuda:0 whereas model wieghts of layer 11 are in cuda:1.
I tried to use dispatch_model() from hugging_face for solving this, however it also doesnt work.
Any help would be highly appreciated.