Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customized position_ids not working #33938

Open
4 tasks
huchanwei123 opened this issue Oct 4, 2024 · 4 comments
Open
4 tasks

Customized position_ids not working #33938

huchanwei123 opened this issue Oct 4, 2024 · 4 comments
Labels

Comments

@huchanwei123
Copy link

System Info

Hello,

I am trying to feed a customized position IDs to Llama model.
If I fed a customized position_ids vector, for example, [[0, 0, 1, 2, 2, 2]] means batch size = 1, 1st and 2nd tokens share the same position, and 3rd-5th tokens share the same position 2, this will cause an error.

The error seems to be located in the function prepare_inputs_for_generation in src/transformers/models/llama/modeling_llama.py, where the position_ids does not change as the cache_position increase, so the shape inconsistency occurs.

Is there any way to successfully feed a customized position ids to the model?
Thanks!

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Give a customized position_ids as one of the input in model.generate()

Expected behavior

Size mismatch

@ArthurZucker
Copy link
Collaborator

Cc @gante

@codeslayed
Copy link

You can directly modify how position_ids are computed within your code before passing them to the model. Ensure that your custom position_ids are aligned with the expected shape and values.

@huchanwei123
Copy link
Author

huchanwei123 commented Oct 4, 2024

You can directly modify how position_ids are computed within your code before passing them to the model. Ensure that your custom position_ids are aligned with the expected shape and values.

Thanks for the reply.
Yes, I am aware that I can pass a customized position_ids to the model, and I believe the shape and values are correct.

Since the model generates token-by-token, feeding a customized position_ids causes an size mismatch error after the first token is generated.
After a little bit digging, I found that in the function prepare_inputs_for_generation, there is no implementation handling when position_ids is not None. As a result, during the generation of the second token, the shape of position_ids is still the same, but it is expected to match the shape of attention_mask.

I am not sure if I missed or misunderstood anything.
Thanks!

@zucchini-nlp
Copy link
Member

It has a tracker here (#29149) and we had a PR a few months ago. Unfortunately the PR was too big and needed to be decomposed into parts, after which it went low in priority :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants