Skip to content

Conversation

@jorisSchaller
Copy link
Contributor

@jorisSchaller jorisSchaller commented Apr 8, 2025

This commit adds:
- TransformerEncoder stage block for non-causal attention.
- Masked MBLM for representation learning.
- A modified PG19 dataset supporting masked language modeling.
- MaskedTrainer to use with the MaskedMBLM and the corresponding (masked) dataset.

TODOs:

  • Masking should not be done on padding tokens
  • Implementation of the forward method for MaskedMBLM
  • Add sensible defaults for the config (mask_proba=0.15, mask_token_id=-100)
  • Test the new return type of the MBLM
  • Test the masked PG19 dataset
  • Test the trainer

@jorisSchaller jorisSchaller changed the title Support of encoder feat: Support of encoder Apr 8, 2025
	- TransformerEncoder stage block for non-causal attention.
	- Masked MBLM for representation learning.
	- A modified PG19 dataset supporting masked language modeling.
	- MaskedTrainer to uses the MaskedMBLM and the corresponding (masked) dataset.
@jorisSchaller jorisSchaller self-assigned this Apr 14, 2025
Copy link
Contributor

@jannisborn jannisborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job @jorisSchaller! I see you're testing up to a 2D encoder and the CI passes, so I think the actual work is done, great 💪🏼 👍🏼

I have some cosmetic comments, see below for details. Most important is the naming, I would suggest to rename MaskedMBLM to MBLMEncoder or EncoderMBLM because MLM is more of a training strategy than a model type, wdyt?

Also, the transformer class definitions are a bit redundant and could be made more efficient with inheritance but I let you decide on this because everything looks technically correct!

@jannisborn jannisborn marked this pull request as ready for review April 16, 2025 22:21
@jannisborn jannisborn merged commit 8c5afcf into main Apr 17, 2025
8 checks passed
@jannisborn jannisborn deleted the feat/encoder branch April 17, 2025 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants