Closed
Description
Is there an existing issue for this bug?
- I have searched the existing issues
🐛 Describe the bug
I trained reward model based on Llama3.1-70B-instruct in 48 H100 (3d tp=8, pp=1, ).
When execute booster.save_model(model, os.path.join(save_dir, "modeling"), shard=True)
, the size of model.embed_tokens.weight
saved is [16064, 8192] rather than [128256, 8192]. However, the size of other weight are correct.
Please HELP ME!
Thank you!
Environment
transformes 4.44.1
colosssalai 0.4.5
flash-attn 2.6.3