Skip to content

Cannot load the previous model weights when using ZeRO 3 optimizer in DeepSpeed Chat #417

Open
@caoyu-noob

Description

@caoyu-noob

Problem:

When I got a previously-trained model state dict file, e.g., a reward model named PATH/pytorch_model.bin. When I try to reload it for further training using ZeRO3 optimizer, an error occurs in L72 in DeepSpeed-Chat/training/utils/model/model_utils.py.

Exception information like:

size mismatch for rwtranrsformer.h.0.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).

Possible Reason:

When using ZeRO3 optimizer, a HfDeepSpeedConfig will be created in L30 in DeepSpeed-Chat/training/utils/model/model_utils.py, then the following models will be initialized and partitioned into different GPUs automatically by HF and thus it cannot be loaded directly via load_state_dict in PyTorch.

Metadata

Metadata

Labels

deespeed chatDeepSpeed Chatnew-configA modified config from the given example

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions