[Bug]: Need to return attention_mask when padding

### Describe the issue

In [these lines](https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/llava_arch.py#L316-L319) of `LlavaMetaForCausalLM.prepare_inputs_labels_for_multimodal()`, when we have padded the input we always need to return the padded attention mask. 

This is as simple as changing this

```python
# Bug, fails to return padded attention mask
if _attention_mask is None:
    attention_mask = None
else:
    attention_mask = attention_mask.to(dtype=_attention_mask.dtype)
```

To this:

```python
# Update: always return attention mask if we padded (if any values are False)
if attention_mask.all():  # Not padded
    if _attention_mask is None:
        attention_mask = None
    else:
        attention_mask = attention_mask.to(dtype=_attention_mask.dtype)
```

That being said, I don't see why we can't just always return `attention_mask` as is, essentially just commenting out all of these lines of code. The re-computed `attention_mask` should have the correct dtype, device and values even if `_attention_mask = None` (no input attention mask). But maybe I'm missing something. 

This fixes batch inference in v1.6, e.g. #1149, #1305, and probably others. Note that you also have to apply the changes from PR #1502 to get batch inference to work in `run_llava.py`.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Need to return attention_mask when padding #1720

Describe the issue

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]: Need to return attention_mask when padding #1720

Description

Describe the issue

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions