Please bring code features from MPT-7b back to MPT-1b for use of MPT-1b with SFTTrainer.

What I want to do:
```
model = MosaicGPT.from_pretrained(
    "mosaicml/mpt-1b-redpajama-200b",
    trust_remote_code=True,
    attn_impl='torch'
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_train_data["train"],
    eval_dataset=tokenized_val_data["validation"],
    dataset_text_field="text",
    args=training_args,
    neftune_noise_alpha=5 #the only one important thing for me
)
```

Yet it fails with various missing features in MPT-1b implementation:
 - forward with labels (like [this on in MPT-7b](https://huggingface.co/mosaicml/mpt-7b/blob/main/modeling_mpt.py#L269))
 - get_input_embeddings (like [this on in MPT-7b](https://huggingface.co/mosaicml/mpt-7b/blob/main/modeling_mpt.py#L83))

and potentially others.

Please help the community to use MPT-1b by:
a) retraining MPT-7b with 1b params size weights and MPT-7b code base
b) by updating MPT-1b codebase (which diverges from MPT-7b in terms of architecture a bit)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please bring code features from MPT-7b back to MPT-1b for use of MPT-1b with SFTTrainer. #439

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Please bring code features from MPT-7b back to MPT-1b for use of MPT-1b with SFTTrainer. #439

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions