Open
Description
Describe the issue
In these lines of LlavaMetaForCausalLM.prepare_inputs_labels_for_multimodal()
, when we have padded the input we always need to return the padded attention mask.
This is as simple as changing this
# Bug, fails to return padded attention mask
if _attention_mask is None:
attention_mask = None
else:
attention_mask = attention_mask.to(dtype=_attention_mask.dtype)
To this:
# Update: always return attention mask if we padded (if any values are False)
if attention_mask.all(): # Not padded
if _attention_mask is None:
attention_mask = None
else:
attention_mask = attention_mask.to(dtype=_attention_mask.dtype)
That being said, I don't see why we can't just always return attention_mask
as is, essentially just commenting out all of these lines of code. The re-computed attention_mask
should have the correct dtype, device and values even if _attention_mask = None
(no input attention mask). But maybe I'm missing something.
This fixes batch inference in v1.6, e.g. #1149, #1305, and probably others. Note that you also have to apply the changes from PR #1502 to get batch inference to work in run_llava.py
.
Metadata
Metadata
Assignees
Labels
No labels