Skip to content

MosaicBERT: pretraining configuration for models > 128 seq. length #442

Open
@stefan-it

Description

@stefan-it

Hi MosaicML team,

many thanks for releasing the code and models for your MosaicBERT! I highly appreciate the effort that you put in modernizing the BERT architecture.

I am interested in pretraining MosaicBERT so I have some questions :)

  • I am interested in the pretraining configuration for the model with 512 sequence length. Additionally: do you have hardware recommendations and the approx. time to pretrain MosaicBERT with 512 seq. length. Did you use the phase 1 + phase 2 "trick" with pretraining for 128 seq. length and then fewer steps with 512? For that, the MosaicBERT with 128 seq. length could be "recycled".
  • I'm also interested in what implementation is recommended to use e.g. a tagged/specific commit or the upcoming Modernize MosaicBERT #440 PR.

Many thanks in advance!

Stefan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions