Skip to content

Hugging Face Transformers

toncho11 edited this page Feb 9, 2023 · 6 revisions

Check point

You may often want to save and continue a state of training using a checkpoint. Doing so requires saving and loading the model, optimizer, RNG generators, and the GradScaler. The tokenizer and model should always be from the same checkpoint! A checkpoint is a model, it is a modified version of a base model.

Heads

An additional component, usually made up of one or a few layers, to convert the transformer predictions to a task-specific output

AutoModel

An object that returns the correct architecture based on the checkpoint.

Sigmoid vs SoftMax

Sigmoid is used for binary classification methods where we only have 2 classes, while SoftMax applies to multiclass problems. In fact, the SoftMax function is an extension of the Sigmoid function.

Techniques to be aware of when batching sequences of different lengths together

Truncating, Padding and Attention Masking

Clone this wiki locally