Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug that may cause device inconsistency #31930

Open
2 of 4 tasks
liuao743 opened this issue Jul 12, 2024 · 9 comments
Open
2 of 4 tasks

A bug that may cause device inconsistency #31930

liuao743 opened this issue Jul 12, 2024 · 9 comments
Assignees
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@liuao743
Copy link

System Info

In transformers/generation/util.py line:2297
The device of unfinished_sequences is same as input_ids.device()
But in line:2351, If the model is split across different GPUs, for example, input_ids is on GPU 0, and the model executes pipeline parallel on GPUs 0 and 1, the outputs will be on GPU 1, which leads to devices inconsistency in line:2404

Who can help?

@zucchini-nlp @gan

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The inference example of internvl2-40B
https://github.com/OpenGVLab/InternVL

Expected behavior

No error.

@zucchini-nlp
Copy link
Member

Thanks for reporting!

Seems like major changes we did recently for compile + generate caused this error.

@gante this might also cause errors in LogitsProcessors because now we have all special tokens as tensors in the same device as input-ids. IMO we need to init special tokens/other tensors in the correct device once in the beginning, instead of moving around devices when there's a mismatch. WDYT of init tensors with device=self.lm_head.device?
I'll see what are the options and open a PR next week :)

@gante
Copy link
Member

gante commented Jul 14, 2024

@zucchini-nlp

WDYT of init tensors with device=self.lm_head.device?

Sounds good! You may need extra logic in special models, like encoder-decoder models, but in general it sounds the most sensible thing to do.

Assigning this issue to you, and looking forward to the PR 🤗

@zucchini-nlp
Copy link
Member

Okay, I did a little bit of digging and found that it happens only in some VLMs, when the model has a custom generate function and calls internally self.language_model.generate(**inputs, **kwargs).

Since accelerate attaches a hook on the model to align devices, usually we don;t see such errors and the model outputs are in the same device as the inputs. But in VLMs the hook is attached to the model as a whole, and not to its vision model and language model separately, which causes the output of language model stay on the device where it was last executed

This surely can be fixed with a trick different suggested above, but I think we should better start using the model's generate() instead of custom function that calls language_model.generate() under the hood

@gante let me know if you don't agree, I can open a PR to manually add a hook to the generation model inside generate() or probably we would need a general way to load composite models with device_map=auto.

@gante
Copy link
Member

gante commented Jul 15, 2024

@zucchini-nlp

but I think we should better start using the model's generate() instead of custom function that calls language_model.generate() under the hood

If the model's generate() is a thin wrapper for language_model.generate(), yes, we should get rid of it. On the other hand, if it holds model-specific logic, then it should stay as is (to avoid making the generalist generate() bloated) 🤗

I can open a PR to manually add a hook to the generation model inside generate()

Not sure if I got it right -- you mean adding a hook on the model's generate() (and not to the generalist generate())?

@zucchini-nlp
Copy link
Member

If the model's generate() is a thin wrapper for language_model.generate(), yes, we should get rid of it. On the other hand, if it holds model-specific logic, then it should stay as is (to avoid making the generalist generate() bloated) 🤗

Yes, it is the same wrapper as in BLIP models in out repo, that calls vision tower and prepares inputs embeds by merging. I think that should be removed, similarly to what we're doing to get rid of all custom logic in VLMs.
Btw, there are audio models with custom generate, but they won't encounter the same issue because at the end they call the super().generate()

Not sure if I got it right -- you mean adding a hook on the model's generate() (and not to the generalist generate())

Generalist generate but that will be a bad hack. The second option of modifying hooks when loading is better, so that we also align io_device for each PreTrainedModel that is part of the model being loaded. But, as I said it shouldn't be necessary if we start using VLMs without custom call to the LM backbone

@gante
Copy link
Member

gante commented Jul 16, 2024

@zucchini-nlp sounds good!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@gante
Copy link
Member

gante commented Aug 23, 2024

This is not complete, right @zucchini-nlp?

(If it is, plz re-close it :D)

@gante gante reopened this Aug 23, 2024
@gante gante added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Aug 23, 2024
@zucchini-nlp
Copy link
Member

Right, we still can't get rid of custom generate on BLIP models, we need to wait till the end of deprecation cycle. One thing is that we can't fix models shared on the hub, but we can share best practices with them in internal colab channels :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

No branches or pull requests

3 participants