A bug that may cause device inconsistency

### System Info

In transformers/generation/util.py line:2297
The device of unfinished_sequences is same as input_ids.device()
But in line:2351, If the model is split across different GPUs, for example, input_ids is on GPU 0, and the model executes pipeline parallel on GPUs 0 and 1, the outputs will be on GPU 1, which leads to devices inconsistency in line:2404

### Who can help?

@zucchini-nlp @gan

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

The inference example of internvl2-40B
https://github.com/OpenGVLab/InternVL

### Expected behavior

No error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A bug that may cause device inconsistency #31930

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A bug that may cause device inconsistency #31930

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions