[BUG]: bug in training rm with ddp strategy with single machine multi-GPUs!

### 🐛 Describe the bug

Code:
------------------------------------------------------------
 torchrun --standalone --nproc_per_node=1 train_reward_model.py --dataset Dahoas/rm-static               --subset ../../../datasets/Dahoas_rm-static               --max_len 512  --model gpt2                --pretrain ../../../gpt2/gpt2-small                --lora_rank 0  --max_epochs 1                  --batch_size 1                  --loss_fn log_sig                  --test True                  --need_optim_ckpt True                  --strategy ddp                  --save_path rm_ckpt.pt 


Error: 
------------------------------------------------------------
![image](https://user-images.githubusercontent.com/37849338/229699763-6b16a770-d841-4416-884d-ebc43068e315.png)
![image](https://user-images.githubusercontent.com/37849338/229699888-be8915cd-ae7d-4e99-aac0-3faf1251bfb9.png)




### Environment

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: bug in training rm with ddp strategy with single machine multi-GPUs! #3421

🐛 Describe the bug

Code:

Error:

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: bug in training rm with ddp strategy with single machine multi-GPUs! #3421

Description

🐛 Describe the bug

Code:

Error:

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions