-
Notifications
You must be signed in to change notification settings - Fork 26.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A bug that may cause device inconsistency #31930
Comments
Thanks for reporting! Seems like major changes we did recently for @gante this might also cause errors in LogitsProcessors because now we have all special tokens as tensors in the same device as input-ids. IMO we need to init special tokens/other tensors in the correct device once in the beginning, instead of moving around devices when there's a mismatch. WDYT of init tensors with |
Sounds good! You may need extra logic in special models, like encoder-decoder models, but in general it sounds the most sensible thing to do. Assigning this issue to you, and looking forward to the PR 🤗 |
Okay, I did a little bit of digging and found that it happens only in some VLMs, when the model has a custom generate function and calls internally Since accelerate attaches a hook on the model to align devices, usually we don;t see such errors and the model outputs are in the same device as the inputs. But in VLMs the hook is attached to the model as a whole, and not to its vision model and language model separately, which causes the output of language model stay on the device where it was last executed This surely can be fixed with a trick different suggested above, but I think we should better start using the model's @gante let me know if you don't agree, I can open a PR to manually add a hook to the generation model inside |
If the model's
Not sure if I got it right -- you mean adding a hook on the model's |
Yes, it is the same wrapper as in BLIP models in out repo, that calls vision tower and prepares inputs embeds by merging. I think that should be removed, similarly to what we're doing to get rid of all custom logic in VLMs.
Generalist generate but that will be a bad hack. The second option of modifying hooks when loading is better, so that we also align io_device for each |
@zucchini-nlp sounds good! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This is not complete, right @zucchini-nlp? (If it is, plz re-close it :D) |
Right, we still can't get rid of custom generate on BLIP models, we need to wait till the end of deprecation cycle. One thing is that we can't fix models shared on the hub, but we can share best practices with them in internal colab channels :) |
System Info
In transformers/generation/util.py line:2297
The device of unfinished_sequences is same as input_ids.device()
But in line:2351, If the model is split across different GPUs, for example, input_ids is on GPU 0, and the model executes pipeline parallel on GPUs 0 and 1, the outputs will be on GPU 1, which leads to devices inconsistency in line:2404
Who can help?
@zucchini-nlp @gan
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The inference example of internvl2-40B
https://github.com/OpenGVLab/InternVL
Expected behavior
No error.
The text was updated successfully, but these errors were encountered: