Release v1.0.0 - Name change, new models (XLNet, XLM), unified API for models and tokenizer, access to models internals, torchscript · huggingface/transformers

Name change: welcome PyTorch-Transformers 👾

pytorch-pretrained-bert => pytorch-transformers

Install with pip install pytorch-transformers

New models

XLNet (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.

New pretrained weights

We went from ten (in pytorch-pretrained-bert 0.6.2) to twenty-seven (in pytorch-transformers 1.0) pretrained model weights.

The newly added model weights are, in summary:

Two Whole-Word-Masking weights for Bert (cased and uncased)
Three Fine-tuned models for Bert (on SQuAD and MRPC)
One German model for Bert provided and trained by Deepset.ai (@tholor and @Timoeller) as detailed in their nice blogpost
One OpenAI GPT-2 model (medium size model)
Two models (base and large) for the newly added XLNet model
Eight models for the newly added XLM model

The documentation lists all the models with the shortcut names and we are currently adding full details of the associated pretraining/fine-tuning parameters.

New documentation

New documentation is currently being created at https://huggingface.co/pytorch-transformers/ and should be finalized over the coming days.

Standard API across models

See the readme for a quick tour of the API.

Main points:

All models now return tuples with various elements depending on the model and the configuration. The docstrings and documentation list all the expected outputs in order.
All models can now return the full list of hidden-states (embeddings output + the output hidden-states of each layer)
All models can now return the full list of attention weights (one tensor of attention weights for each layer)

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased',
                                    output_hidden_states=True,
                                    output_attentions=True)
input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
all_hidden_states, all_attentions = model(input_ids)[-2:]

Standard API to add tokens to the vocabulary and the model

Using tokenizer.add_tokens() and tokenizer.add_special_tokens(), one can now easily add tokens to each model vocabulary. The model's input embeddings can be resized accordingly to add associated word embeddings (to be trained) using model.resize_token_embeddings(len(tokenizer))

tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))

Serialization

The serialization methods have been standardized and you probably should switch to the new method save_pretrained(save_directory) if you were using any other serialization method before.

model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')

### Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')

Torchscript

All models are now compatible with Torchscript.

model = model_class.from_pretrained(pretrained_weights, torchscript=True)
traced_model = torch.jit.trace(model, (input_ids,))

Examples scripts

The examples scripts have been refactored and gathered in three main examples (run_glue.py, run_squad.py and run_generation.py) which are common to several models and are designed to offer SOTA performances on the respective tasks while being clean starting point to design your own scripts.

Other examples scripts (like run_bertology.py) will be added in the coming weeks.

Breaking-changes

The migration section of the readme lists the breaking changes when switching from pytorch-pretrained-bert to pytorch-transformers.

The main breaking change is that all models now returns a tuple of results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.0.0 - Name change, new models (XLNet, XLM), unified API for models and tokenizer, access to models internals, torchscript