Description
Problem:
When I got a previously-trained model state dict file, e.g., a reward model named PATH/pytorch_model.bin
. When I try to reload it for further training using ZeRO3 optimizer, an error occurs in L72 in DeepSpeed-Chat/training/utils/model/model_utils.py
.
Exception information like:
size mismatch for rwtranrsformer.h.0.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([0]).
Possible Reason:
When using ZeRO3 optimizer, a HfDeepSpeedConfig
will be created in L30 in DeepSpeed-Chat/training/utils/model/model_utils.py
, then the following models will be initialized and partitioned into different GPUs automatically by HF and thus it cannot be loaded directly via load_state_dict
in PyTorch.