[BUG] HybridEngine llama2 70B generate result is wrong and "The size of tensor a (12) must match the size of tensor b (48) at non-singleton dimension 0" when  inference_tp_size > 1

**Describe the bug**
HybridEngine llama2 70B generate have two bugs:
1. When inference_tp_size == 1, generate result is not right.
2. When inference_tp_size > 1, 
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 593, in _prepare_decoder_attention_mask
expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask
The size of tensor a (12) must match the size of tensor b (48) at non-singleton dimension 0

**System info (please complete the following information):**
 - transformers:	0.10.3
 - deepspeed:		4.31.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] HybridEngine llama2 70B generate result is wrong and "The size of tensor a (12) must match the size of tensor b (48) at non-singleton dimension 0" when inference_tp_size > 1 #4345

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] HybridEngine llama2 70B generate result is wrong and "The size of tensor a (12) must match the size of tensor b (48) at non-singleton dimension 0" when inference_tp_size > 1 #4345

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions