Open
Description
What I want to do:
model = MosaicGPT.from_pretrained(
"mosaicml/mpt-1b-redpajama-200b",
trust_remote_code=True,
attn_impl='torch'
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=tokenized_train_data["train"],
eval_dataset=tokenized_val_data["validation"],
dataset_text_field="text",
args=training_args,
neftune_noise_alpha=5 #the only one important thing for me
)
Yet it fails with various missing features in MPT-1b implementation:
- forward with labels (like this on in MPT-7b)
- get_input_embeddings (like this on in MPT-7b)
and potentially others.
Please help the community to use MPT-1b by:
a) retraining MPT-7b with 1b params size weights and MPT-7b code base
b) by updating MPT-1b codebase (which diverges from MPT-7b in terms of architecture a bit)
Metadata
Metadata
Assignees
Labels
No labels