Open
Description
When setting zero_stage=3, load my own ckpt in .pt format:
model_config = AutoConfig.from_pretrained(model_name_or_path)
model = AutoModel.from_config(model_config)
the shape of param in model is all torch.Size([0]), because the error message is:
size mismatch for model.decoder.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([0]).
The ckpt I load is with correct shape of params, but the model build from config seems wrong.
With other zero_stages, the error will not occur.
I run the script line by line, the error does not occur either.
Anyone come across this problem?