Skip to content

Load model error in step3 #560

Open
Open
@YingtongBu2

Description

@YingtongBu2

When setting zero_stage=3, load my own ckpt in .pt format:

model_config = AutoConfig.from_pretrained(model_name_or_path)
model = AutoModel.from_config(model_config)

the shape of param in model is all torch.Size([0]), because the error message is:
size mismatch for model.decoder.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([0]).
The ckpt I load is with correct shape of params, but the model build from config seems wrong.
With other zero_stages, the error will not occur.

I run the script line by line, the error does not occur either.
Anyone come across this problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdeespeed chatDeepSpeed Chat

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions