v1.0.0 - Name change, new models (XLNet, XLM), unified API for models and tokenizer, access to models internals, torchscript
Name change: welcome PyTorch-Transformers 👾
pytorch-pretrained-bert
=> pytorch-transformers
Install with pip install pytorch-transformers
New models
- XLNet (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
- XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
New pretrained weights
We went from ten (in pytorch-pretrained-bert
0.6.2) to twenty-seven (in pytorch-transformers
1.0) pretrained model weights.
The newly added model weights are, in summary:
- Two
Whole-Word-Masking
weights for Bert (cased and uncased) - Three Fine-tuned models for Bert (on SQuAD and MRPC)
- One German model for Bert provided and trained by Deepset.ai (@tholor and @Timoeller) as detailed in their nice blogpost
- One OpenAI GPT-2 model (medium size model)
- Two models (base and large) for the newly added XLNet model
- Eight models for the newly added XLM model
The documentation lists all the models with the shortcut names and we are currently adding full details of the associated pretraining/fine-tuning parameters.
New documentation
New documentation is currently being created at https://huggingface.co/pytorch-transformers/ and should be finalized over the coming days.
Standard API across models
See the readme for a quick tour of the API.
Main points:
- All models now return
tuples
with various elements depending on the model and the configuration. The docstrings and documentation list all the expected outputs in order. - All models can now return the full list of hidden-states (embeddings output + the output hidden-states of each layer)
- All models can now return the full list of attention weights (one tensor of attention weights for each layer)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased',
output_hidden_states=True,
output_attentions=True)
input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
all_hidden_states, all_attentions = model(input_ids)[-2:]
Standard API to add tokens to the vocabulary and the model
Using tokenizer.add_tokens()
and tokenizer.add_special_tokens()
, one can now easily add tokens to each model vocabulary. The model's input embeddings can be resized accordingly to add associated word embeddings (to be trained) using model.resize_token_embeddings(len(tokenizer))
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))
Serialization
The serialization methods have been standardized and you probably should switch to the new method save_pretrained(save_directory)
if you were using any other serialization method before.
model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')
### Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
Torchscript
All models are now compatible with Torchscript.
model = model_class.from_pretrained(pretrained_weights, torchscript=True)
traced_model = torch.jit.trace(model, (input_ids,))
Examples scripts
The examples scripts have been refactored and gathered in three main examples (run_glue.py
, run_squad.py
and run_generation.py
) which are common to several models and are designed to offer SOTA performances on the respective tasks while being clean starting point to design your own scripts.
Other examples scripts (like run_bertology.py
) will be added in the coming weeks.
Breaking-changes
The migration section of the readme lists the breaking changes when switching from pytorch-pretrained-bert
to pytorch-transformers
.
The main breaking change is that all models now returns a tuple
of results.