Closed
Description
System Info
In transformers/generation/util.py line:2297
The device of unfinished_sequences is same as input_ids.device()
But in line:2351, If the model is split across different GPUs, for example, input_ids is on GPU 0, and the model executes pipeline parallel on GPUs 0 and 1, the outputs will be on GPU 1, which leads to devices inconsistency in line:2404
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The inference example of internvl2-40B
https://github.com/OpenGVLab/InternVL
Expected behavior
No error.