Skip to content

[Bug]: Need to return attention_mask when padding #1720

Open
@collinmccarthy

Description

@collinmccarthy

Describe the issue

In these lines of LlavaMetaForCausalLM.prepare_inputs_labels_for_multimodal(), when we have padded the input we always need to return the padded attention mask.

This is as simple as changing this

# Bug, fails to return padded attention mask
if _attention_mask is None:
    attention_mask = None
else:
    attention_mask = attention_mask.to(dtype=_attention_mask.dtype)

To this:

# Update: always return attention mask if we padded (if any values are False)
if attention_mask.all():  # Not padded
    if _attention_mask is None:
        attention_mask = None
    else:
        attention_mask = attention_mask.to(dtype=_attention_mask.dtype)

That being said, I don't see why we can't just always return attention_mask as is, essentially just commenting out all of these lines of code. The re-computed attention_mask should have the correct dtype, device and values even if _attention_mask = None (no input attention mask). But maybe I'm missing something.

This fixes batch inference in v1.6, e.g. #1149, #1305, and probably others. Note that you also have to apply the changes from PR #1502 to get batch inference to work in run_llava.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions