[BUG] Unreasonable memory comsumption 

### 🐛 Describe the bug

Creating an TransformerEncoder causes memory overflow, but the same config works with the huggingface `transformers` module.
```python
# config.py
from colossalai.amp import AMP_TYPE

fp16=dict(
    mode=AMP_TYPE.TORCH
)
NUM_MICRO_BATCHES=8
parallel = dict(
    tensor=dict(size=4, mode='2d')
)

# launch command: python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 xxx.py
# Memory overflow on Nvidia 2080 Ti
from titans.layer.block import TransformerEncoderLayer,TransformerEncoder
colossalai.launch_from_torch(config='/home/zyzeng/fastnlp/examples/config.py')
backbone=TransformerEncoder(
    TransformerEncoderLayer(hidden_size=768, nhead=12, dim_feedforward=768*4), 
    num_layers=12
)
```

```python
# No memory overflow on Nvidia 2080 Ti
from transformers import BertModel, AutoConfig
config=AutoConfig.from_pretrained('bert-base-uncased')
model=BertModel(config)
model.cuda()
```

### Environment

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Unreasonable memory comsumption #53

🐛 Describe the bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Unreasonable memory comsumption #53

Description

🐛 Describe the bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions