-
Notifications
You must be signed in to change notification settings - Fork 5
Feature/mlm #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/mlm #22
Conversation
@LeonGuertler this is a beautful example of how a PR should be --> list of changes |
@@ -75,7 +75,7 @@ def compute_perplexity(logits, y, char_lengths, mask=None): | |||
# flatten both | |||
logits = logits.view(-1, logits.size(-1)) | |||
y = y.view(-1) | |||
loss = torch.nn.functional.cross_entropy(logits, y, reduction="none") | |||
loss = torch.nn.functional.cross_entropy(logits, y, reduction="none", ignore_index=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are 100% sure this has no impacts elsewhere? should be okay but..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Ignore_index's default value is -100. So, changing it from -100 to -1 would only impact existing label (y) assignments of -1 or -100.
- I searched our code base for the assignment of -100, there was none.
- I searched our code base for the assignment of -1. Apart from the new
MLMDataLoader
using label[~mask]=-1, other assignments were used in arguments such as 'dim=-1', or just as index values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
embedder: | ||
tokenizer_type: gpt2 | ||
embedding_model_type: generic | ||
dataset_name: simple_en_wiki |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to stlm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, can I clarify what should be changed to stlm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dataset_name, but its okay, this will all be reworked later
I propose closing this, as we decided not to do MLM for now... feel free to rework this if you like, should be pretty simple to update, and happy to merge in since it doesn't add much code |
MLMDataloader
- that masks inputs (80% masked, 10% randomised, 10% untouched),MLMDataloader
in the dataloader dictionary.