Cannot load the previous model weights when using ZeRO 3 optimizer in DeepSpeed Chat

**Problem:**

When I got a previously-trained model state dict file, e.g., a reward model named `PATH/pytorch_model.bin`. When I try to reload it for further training using ZeRO3 optimizer, an error occurs in L72 in `DeepSpeed-Chat/training/utils/model/model_utils.py`.

Exception information like:

`size mismatch for rwtranrsformer.h.0.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).`

**Possible Reason:**

When using ZeRO3 optimizer, a `HfDeepSpeedConfig` will be created in L30 in `DeepSpeed-Chat/training/utils/model/model_utils.py`, then the following models will be initialized and partitioned into different GPUs automatically by HF and thus it cannot be loaded directly via `load_state_dict` in PyTorch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot load the previous model weights when using ZeRO 3 optimizer in DeepSpeed Chat #417

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot load the previous model weights when using ZeRO 3 optimizer in DeepSpeed Chat #417

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions