Skip to content

[BUG] HybridEngine llama2 70B generate result is wrong and "The size of tensor a (12) must match the size of tensor b (48) at non-singleton dimension 0" when inference_tp_size > 1 #4345

Open
@xiaopqr

Description

@xiaopqr

Describe the bug
HybridEngine llama2 70B generate have two bugs:

  1. When inference_tp_size == 1, generate result is not right.
  2. When inference_tp_size > 1,
    File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 593, in _prepare_decoder_attention_mask
    expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask
    The size of tensor a (12) must match the size of tensor b (48) at non-singleton dimension 0

System info (please complete the following information):

  • transformers: 0.10.3
  • deepspeed: 4.31.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions