-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Support of encoder #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- TransformerEncoder stage block for non-causal attention. - Masked MBLM for representation learning. - A modified PG19 dataset supporting masked language modeling. - MaskedTrainer to uses the MaskedMBLM and the corresponding (masked) dataset.
9ce03d6 to
0db4c9a
Compare
jannisborn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing job @jorisSchaller! I see you're testing up to a 2D encoder and the CI passes, so I think the actual work is done, great 💪🏼 👍🏼
I have some cosmetic comments, see below for details. Most important is the naming, I would suggest to rename MaskedMBLM to MBLMEncoder or EncoderMBLM because MLM is more of a training strategy than a model type, wdyt?
Also, the transformer class definitions are a bit redundant and could be made more efficient with inheritance but I let you decide on this because everything looks technically correct!
This commit adds:
- TransformerEncoder stage block for non-causal attention.
- Masked MBLM for representation learning.
- A modified PG19 dataset supporting masked language modeling.
- MaskedTrainer to use with the MaskedMBLM and the corresponding (masked) dataset.
TODOs: