-
Notifications
You must be signed in to change notification settings - Fork 2
Hugging Face Transformers
toncho11 edited this page Feb 9, 2023
·
6 revisions
You may often want to save and continue a state of training using a checkpoint. Doing so requires saving and loading the model, optimizer, RNG generators, and the GradScaler. The tokenizer and model should always be from the same checkpoint! A checkpoint is a model, it is a modified version of a base model.
An additional component, usually made up of one or a few layers, to convert the transformer predictions to a task-specific output
An object that returns the correct architecture based on the checkpoint.
Sigmoid is used for binary classification methods where we only have 2 classes, while SoftMax applies to multiclass problems. In fact, the SoftMax function is an extension of the Sigmoid function.
Truncating, Padding and Attention Masking