22 Jul 10:53

72aee83

v4.9.0: TensorFlow examples, CANINE, tokenizer training, ONNX rework

ONNX rework

This version introduces a new package, transformers.onnx, which can be used to export models to ONNX. Contrary to the previous implementation, this approach is meant as an easily extendable package where users may define their own ONNX configurations and export the models they wish to export.

python -m transformers.onnx --model=bert-base-cased onnx/bert-base-cased/

Validating ONNX model...
        -[✓] ONNX model outputs' name match reference model ({'pooler_output', 'last_hidden_state'}
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 8, 768) matchs (2, 8, 768)
                -[✓] all values close (atol: 0.0001)
        - Validating ONNX Model output "pooler_output":
                -[✓] (2, 768) matchs (2, 768)
                -[✓] all values close (atol: 0.0001)
All good, model saved at: onnx/bert-base-cased/model.onnx

[RFC] Laying down building stone for more flexible ONNX export capabilities #11786 (@mfuntowicz)

CANINE model

Four new models are released as part of the CANINE implementation: CanineForSequenceClassification, CanineForMultipleChoice, CanineForTokenClassification and CanineForQuestionAnswering, in PyTorch.

The CANINE model was proposed in CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. It’s among the first papers that train a Transformer without using an explicit tokenization step (such as Byte Pair Encoding (BPE), WordPiece, or SentencePiece). Instead, the model is trained directly at a Unicode character level. Training at a character level inevitably comes with a longer sequence length, which CANINE solves with an efficient downsampling strategy, before applying a deep Transformer encoder.

Add CANINE #12024 (@NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=canine

Tokenizer training

This version introduces a new method to train a tokenizer from scratch based off of an existing tokenizer configuration.

from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset("wikitext", name="wikitext-2-raw-v1", split="train")
# We train on batch of texts, 1000 at a time here.
batch_size = 1000
corpus = (dataset[i : i + batch_size]["text"] for i in range(0, len(dataset), batch_size))

tokenizer = AutoTokenizer.from_pretrained("gpt2")
new_tokenizer = tokenizer.train_new_from_iterator(corpus, vocab_size=20000)

Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) #12420 (@SaulLu)
Easily train a new fast tokenizer from a given one #12361 (@sgugger)

TensorFlow examples

The TFTrainer is now entering deprecation - and it is replaced by Keras. With version v4.9.0 comes the end of a long rework of the TensorFlow examples, for them to be more Keras-idiomatic, clearer, and more robust.

NER example for Tensorflow #12469 (@Rocketknight1)
TF summarization example #12617 (@Rocketknight1)
Adding TF translation example #12667 (@Rocketknight1)
Deprecate TFTrainer #12706 (@Rocketknight1)

TensorFlow implementations

HuBERT is now implemented in TensorFlow:

Add TFHubertModel #12206 (@will-rice)

Breaking changes

When load_best_model_at_end was set to True in the TrainingArguments, having a different save_strategy and eval_strategy was accepted but the save_strategy was overwritten by the eval_strategy (the option to keep track of the best model needs to make sure there is an evaluation each time there is a save). This led to a lot of confusion with users not understanding why the script was not doing what it was told, so this situation will now raise an error indicating to set save_strategy and eval_strategy to the same values, and in the case that value is "steps", save_steps must be a round multiple of eval_steps.

General improvements and bugfixes

UpdateDescription of TrainingArgs param save_strategy #12328 (@sam-qordoba)
[Deepspeed] new docs #12077 (@stas00)
[ray] try fixing import error #12338 (@richardliaw)
[examples/Flax] move the examples table up #12341 (@patil-suraj)
Fix torchscript tests #12336 (@LysandreJik)
Add flax/jax quickstart #12342 (@marcvanzee)
Fixed a typo in readme #12356 (@MichalPitr)
Fix exception in prediction loop occurring for certain batch sizes #12350 (@jglaser)
Add FlaxBigBird QuestionAnswering script #12233 (@vasudevgupta7)
Replace NotebookProgressReporter by ProgressReporter in Ray Tune run #12357 (@krfricke)
[examples] remove extra white space from log format #12360 (@stas00)
fixed multiplechoice tokenization #12362 (@cronoik)
[trainer] add main_process_first context manager #12351 (@stas00)
[Examples] Replicates the new --log_level feature to all trainer-based pytorch #12359 (@bhadreshpsavani)
[Examples] Update Example Template for --log_level feature #12365 (@bhadreshpsavani)
[Examples] Replace print statement with logger.info in QA example utils #12368 (@bhadreshpsavani)
Onnx export v2 fixes #12388 (@LysandreJik)
[Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers #12371 (@ionicsolutions)
Update run_mlm.py #12344 (@TahaAslani)
Add possibility to maintain full copies of files #12312 (@sgugger)
[CI] add dependency table sync verification #12364 (@stas00)
[Examples] Added context manager to datasets map #12367 (@bhadreshpsavani)
[Flax community event] Add more description to readme #12398 (@patrickvonplaten)
Remove the need for einsum in Albert's attention computation #12394 (@mfuntowicz)
[Flax] Adapt flax examples to include push_to_hub #12391 (@patrickvonplaten)
Tensorflow LM examples #12358 (@Rocketknight1)
[Deepspeed] match the trainer log level #12401 (@stas00)
[Flax] Add T5 pretraining script #12355 (@patrickvonplaten)
[models] respect dtype of the model when instantiating it #12316 (@stas00)
Rename detr targets to labels #12280 (@NielsRogge)
Add out of vocabulary error to ASR models #12288 (@will-rice)
Fix TFWav2Vec2 SpecAugment #12289 (@will-rice)
[example/flax] add summarization readme #12393 (@patil-suraj)
[Flax] Example scripts - correct weight decay #12409 (@patrickvonplaten)
fix ids_to_tokens naming error in tokenizer of deberta v2 #12412 (@hjptriplebee)
Minor fixes in original RAG training script #12395 (@shamanez)
Added talks #12415 (@suzana-ilic)
[modelcard] fix #12422 (@stas00)
Add option to save on each training node #12421 (@sgugger)
Added to talks section #12433 (@suzana-ilic)
Fix default bool in argparser #12424 (@sgugger)
Add default bos_token and eos_token for tokenizer of deberta_v2 #12429 (@hjptriplebee)
fix typo in mt5 configuration docstring #12432 (@fcakyon)
Add to talks section #12442 (@suzana-ilic)
[JAX/Flax readme] add philosophy doc #12419 (@patil-suraj)
[Flax] Add wav2vec2 #12271 (@patrickvonplaten)
Add test for a WordLevel tokenizer model #12437 (@SaulLu)
[Flax community event] How to use hub during training #12447 (@patrickvonplaten)
[Wav2Vec2, Hubert] Fix ctc loss test #12458 (@patrickvonplaten)
Comment fast GPU TF tests #12452 (@LysandreJik)
Fix training_args.py barrier for torch_xla #12464 (@jysohn23)
Added talk details #12465 (@suzana-ilic)
Add TPU README #12463 (@patrickvonplaten)
Import check_inits handling of duplicate definitions. #12467 (@Iwontbecreative)
Validation split added: custom data files @sgugger, @patil-suraj #12407 (@Souvic)
Fixing bug with param count without embeddings #12461 (@TevenLeScao)
[roberta] fix lm_head.decoder.weight ignore_key handling #12446 (@stas00)
Rework notebooks and move them to the Notebooks repo #12471 (@sgugger)
fixed typo in flax-projects readme #12466 (@mplemay)
Fix TAPAS test uncovered by #12446 #12480 (@LysandreJik)
Add guide on how to build demos for the Flax sprint #12468 (@osanseviero)
Add Repository import to the FLAX example script #12501 (@LysandreJik)
[examples/flax] clip style image-text training example #12491 (@patil-suraj)
[Flax] Fix wav2vec2 pretrain arguments #12498 (@Wikidepia)
[Flax] ViT training example #12300 (@patil-suraj)
Fix order of state and input in Flax Quickstart README #12510 (@navjotts)
[Flax] Dataset streaming example #12470 (@patrickvonplaten)
[Flax] Correct flax training scripts #12514 (@patrickvonplaten)
[Flax] Correct logging steps flax #12515 (@patrickvonplaten)
[Flax] Fix another bug in logging steps #12516 (@patrickvonplaten)
[Wav2Vec2] Flax - Adapt wav2vec2 script #12520 (@patrickvonplaten)
[Flax] Fix hybrid clip #12519 (@patil-suraj)
[RoFormer] Fix some issues #12397 (@JunnYu)
FlaxGPTNeo #12493 (@patil-suraj)
Updated README #12540 (@suzana-ilic)
Edit readme #12541 (@SaulLu)
implementing tflxmertmodel integration test #12497 (@sadakmed)
[Flax] Adapt examples to be able to use eval_steps and save_steps #12543 (@patrickvonplaten)
[examples/flax] add adafactor optimizer #12544 (@patil-suraj)
[Flax] Add FlaxMBart #12236 (@stancld)
Add a warning for broken ProphetNet fine-tuning #12511 (@JetRunner)
[trainer] add option to ignore keys for the train function too (#11719) #12551 (@shabie)
MLM training fails with no validation file(same as #12406 for pytorch now) #12517 (@Souvic)
[Flax] Allow retraining from save checkpoint #12559 (@patrickvonplaten)
Adding prepare_decoder_input_ids_from_labels methods to all TF ConditionalGeneration models #12560 (@Rocketknight1)
Remove tf.roll wherever not needed #12512 (@szutenberg)
Double check for attribute num_examples #12562 (@sgugger)
[ex...

Assets 2

30 Jun 12:42

LysandreJik

v4.8.2

96d1cfb

Patch release: v4.8.2

Rename detr targets to labels #12280 (@NielsRogge)
fix ids_to_tokens naming error in tokenizer of deberta v2 #12412 (@hjptriplebee)
Add option to save on each training node #12421 (@sgugger)

Assets 2

24 Jun 14:15

sgugger

v4.8.1

1366172

v4.8.1: Patch release

Fix default for TensorBoard folder
Ray Tune install #12338
Tests fixes for Torch FX #12336

Assets 2

23 Jun 17:28

sgugger

v4.8.0

468cda2

v4.8.0 Integration with the Hub and Flax/JAX support

Integration with the Hub

Our example scripts and Trainer are now optimized for publishing your model on the Hugging Face Hub, with Tensorboard training metrics, and an automatically authored model card which contains all the relevant metadata, including evaluation results.

Trainer Hub integration

Use --push_to_hub to create a model repo for your training and it will be saved with all relevant metadata at the end of the training.

Other flags are:

push_to_hub_model_id to control the repo name
push_to_hub_organization to specify an organization

Visualizing Training metrics on huggingface.co (based on Tensorboard)

By default if you have tensorboard installed the training scripts will use it to log, and the logging traces folder is conveniently located inside your model output directory, so you can push them to your model repo by default.

Any model repo that contains Tensorboard traces will spawn a Tensorboard server:

which makes it very convenient to see how the training went! This Hub feature is in Beta so let us know if anything looks weird :)

See this model repo

Model card generation

The model card contains info about the datasets used, the eval results, ...

Many users were already adding their eval results to their model cards in markdown format, but this is a more structured way of adding them which will make it easier to parse and e.g. represent in leaderboards such as the ones on Papers With Code!

We use a format specified in collaboration with [PaperswithCode] (https://github.com/huggingface/huggingface_hub/blame/main/modelcard.md), see also this repo.

Model, tokenizer and configurations

All models, tokenizers and configurations having a revamp push_to_hub() method as well as a push_to_hub argument in their save_pretrained() method. The workflow of this method is changed a bit to be more like git, with a local clone of the repo in a folder of the working directory, to make it easier to apply patches (use use_temp_dir=True to clone in temporary folders for the same behavior as the experimental API).

Clean push to hub API #12187 (@sgugger)

Flax/JAX support

Flax/JAX is becoming a fully supported backend of the Transformers library with more models having an implementation in it. BART, CLIP and T5 join the already existing models, find the whole list here.

[Flax] FlaxAutoModelForSeq2SeqLM #12228 (@patil-suraj)
[FlaxBart] few small fixes #12247 (@patil-suraj)
[FlaxClip] fix test from/save pretrained test #12284 (@patil-suraj)
[Flax] [WIP] allow loading head model with base model weights #12255 (@patil-suraj)
[Flax] Fix flax test save pretrained #12256 (@patrickvonplaten)
[Flax] Add jax flax to env command #12251 (@patrickvonplaten)
add FlaxAutoModelForImageClassification in main init #12298 (@patil-suraj)
Flax T5 #12150 (@vasudevgupta7)
[Flax T5] Fix weight initialization and fix docs #12327 (@patrickvonplaten)
Flax summarization script #12230 (@patil-suraj)
FlaxBartPretrainedModel -> FlaxBartPreTrainedModel #12313 (@sgugger)

General improvements and bug fixes

AutoTokenizer: infer the class from the tokenizer config if possible #12208 (@sgugger)
update desc for map in all examples #12226 (@bhavitvyamalik)
Depreciate pythonic Mish and support PyTorch 1.9 version of Mish #12240 (@digantamisra98)
[t5 doc] make the example work out of the box #12239 (@stas00)
Better CI feedback #12279 (@LysandreJik)
Fix for making student ProphetNet for Seq2Seq Distillation #12130 (@vishal-burman)
[DeepSpeed] don't ignore --adafactor #12257 (@stas00)
Tensorflow QA example #12252 (@Rocketknight1)
[tests] reset report_to to none, avoid deprecation warning #12293 (@stas00)
[trainer + examples] set log level from CLI #12276 (@stas00)
[tests] multiple improvements #12294 (@stas00)
Trainer: adjust wandb installation example #12291 (@stefan-it)
Fix and improve documentation for LEDForConditionalGeneration #12303 (@ionicsolutions)
[Flax] Main doc for event orga #12305 (@patrickvonplaten)
[trainer] 2 bug fixes and a rename #12309 (@stas00)
[docs] performance #12258 (@stas00)
Add CodeCarbon Integration #12304 (@JetRunner)
Optimizing away the fill-mask pipeline. #12113 (@Narsil)
Add output in a dictionary for TF generate method #12139 (@stancld)
Rewrite ProphetNet to adapt converting ONNX friendly #11981 (@jiafatom)
Add mention of the huggingface_hub methods for offline mode #12320 (@LysandreJik)
[Flax/JAX] Add how to propose projects markdown #12311 (@patrickvonplaten)
[TFWav2Vec2] Fix docs #12283 (@chenht2010)
Add all XxxPreTrainedModel to the main init #12314 (@sgugger)
Conda build #12323 (@LysandreJik)
Changed modeling_fx_utils.py to utils/fx.py for clarity #12326 (@michaelbenayoun)

Assets 2

17 Jun 16:20

LysandreJik

v4.7.0

7a6c9fa

v4.7.0: DETR, RoFormer, ByT5, HuBERT, support for torch 1.9.0

v4.7.0: DETR, RoFormer, ByT5, Hubert, support for torch 1.9.0

DETR (@NielsRogge)

Three new models are released as part of the DETR implementation: DetrModel, DetrForObjectDetection and DetrForSegmentation, in PyTorch.

DETR consists of a convolutional backbone followed by an encoder-decoder Transformer which can be trained end-to-end for object detection. It greatly simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use things like region proposals, non-maximum suppression procedure, and anchor generation. Moreover, DETR can also be naturally extended to perform panoptic segmentation, by simply adding a mask head on top of the decoder outputs.

DETR can support any timm backbone.

The DETR model was proposed in End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko.

Add DETR #11653 (@NielsRogge)
Improve DETR #12147 (@NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=detr

ByT5 (@patrickvonplaten)

A new tokenizer is released as part of the ByT5 implementation: ByT5Tokenizer. It can be used with the T5 family of models.

The ByT5 model was presented in ByT5: Towards a token-free future with pre-trained byte-to-byte models by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.

ByT5 model #11971 (@patrickvonplaten)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?search=byt5

RoFormer (@JunnYu)

14 new models are released as part of the RoFormer implementation: RoFormerModel, RoFormerForCausalLM, RoFormerForMaskedLM, RoFormerForSequenceClassification, RoFormerForTokenClassification, RoFormerForQuestionAnswering and RoFormerForMultipleChoice, TFRoFormerModel, TFRoFormerForCausalLM, TFRoFormerForMaskedLM, TFRoFormerForSequenceClassification, TFRoFormerForTokenClassification, TFRoFormerForQuestionAnswering and TFRoFormerForMultipleChoice, in PyTorch and TensorFlow.

RoFormer is a BERT-like autoencoding model with rotary position embeddings. Rotary position embeddings have shown improved performance on classification tasks with long texts. The RoFormer model was proposed in RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.

Add new model RoFormer (use rotary position embedding ) #11684 (@JunnYu)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=roformer

HuBERT (@patrickvonplaten)

HuBERT is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.

HuBERT was proposed in HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.

Two new models are released as part of the HuBERT implementation: HubertModel and HubertForCTC, in PyTorch.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=hubert

Hubert #11889 (@patrickvonplaten)

Hugging Face Course - Part 1

On Monday, June 14th, 2021, we released the first part of the Hugging Face Course. The course is focused on the Hugging Face ecosystem, including transformers. Most of the material in the course is now linked from the transformers documentation which now includes videos to explain singular concepts.

Add video links to the documentation #12162 (@sgugger)
Add link to the course #12229 (@sgugger)

TensorFlow additions

The Wav2Vec2 model can now be used in TensorFlow:

Adding TFWav2Vec2Model #11617 (@will-rice)

PyTorch 1.9 support

Add support for torch 1.9.0 #12224 (@LysandreJik )
fix pt-1.9.0 add_ deprecation #12217 (@stas00)

Notebooks

@NielsRogge has contributed five tutorials on the usage of BERT in his repository: Transformers-Tutorials
[Community Notebooks] Add Emotion Speech Noteboook #11900 (@patrickvonplaten)

General improvements and bugfixes

Vit deit fixes #11309 (@NielsRogge)
Enable option for subword regularization in more tokenizers. #11417 (@PhilipMay)
Fix gpt-2 warnings #11709 (@LysandreJik)
[Flax] Fix BERT initialization & token_type_ids default #11695 (@patrickvonplaten)
BigBird on TPU #11651 (@vasudevgupta7)
[T5] Add 3D attention mask to T5 model (2) (#9643) #11197 (@lexhuismans)
Fix loading the best model on the last stage of training #11718 (@vbyno)
Fix T5 beam search when using parallelize #11717 (@OyvindTafjord)
[Flax] Correct example script #11726 (@patrickvonplaten)
Add Cloud details to README #11706 (@marcvanzee)
Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 #11475 (@michaelbenayoun)
Improvements to Flax finetuning script #11727 (@marcvanzee)
Remove tapas model card #11739 (@julien-c)
Add visual + link to Premium Support webpage #11740 (@julien-c)
Issue with symbolic tracing for T5 #11742 (@michaelbenayoun)
[BigBird Pegasus] Make tests faster #11744 (@patrickvonplaten)
Use new evaluation loop in TrainerQA #11746 (@sgugger)
Flax BERT fix token type init #11750 (@patrickvonplaten)
[TokenClassification] Label realignment for subword aggregation #11680 (@Narsil)
Fix checkpoint deletion #11748 (@sgugger)
Fix incorrect newline in #11650 #11757 (@oToToT)
Add more subsections to main doc #11758 (@patrickvonplaten)
Fixed: Better names for nlp variables in pipelines' tests and docs. #11752 (@01-vyom)
add dataset_name to data_args and added accuracy metric #11760 (@philschmid)
Add Flax Examples and Cloud TPU README #11753 (@avital)
Fix a bug in summarization example which did not load model from config properly #11762 (@tomy0000000)
FlaxGPT2 #11556 (@patil-suraj)
Fix usage of head masks by PT encoder-decoder models' generate() function #11621 (@stancld)
[T5 failing CI] Fix generate test #11770 (@patrickvonplaten)
[Flax MLM] Refactor run mlm with optax #11745 (@patrickvonplaten)
Add DOI badge to README #11771 (@albertvillanova)
Deprecate commands from the transformers-cli that are in the hf-cli #11779 (@LysandreJik)
Fix release utilpattern in conf.py #11784 (@sgugger)
Fix regression in regression #11785 (@sgugger)
A cleaner and more scalable implementation of symbolic tracing #11763 (@michaelbenayoun)
Fix failing test on Windows Platform #11589 (@Lynx1820)
[Flax] Align GLUE training script with mlm training script #11778 (@patrickvonplaten)
Patch recursive import #11812 (@LysandreJik)
fix roformer config doc #11813 (@JunnYu)
[Flax] Small fixes in run_flax_glue.py #11820 (@patrickvonplaten)
[Deepspeed] support zero.Init in from_config #11805 (@stas00)
Add flax text class colab #11824 (@patrickvonplaten)
Faster list concat for trainer_pt_utils.get_length_grouped_indices() #11825 (@ctheodoris)
Replace double occurrences as the last step #11367 (@LysandreJik)
[Flax] Fix PyTorch import error #11839 (@patrickvonplaten)
Fix reference to XLNet #11846 (@sgugger)
Switch mem metrics flag #11851 (@sgugger)
Fix flos single node #11844 (@TevenLeScao)
Fix two typos in docs #11852 (@nickls)
[Trainer] Report both steps and num samples per second #11818 (@sgugger)
Add some tests to the slow suite #11860 (@LysandreJik)
Enable memory metrics in tests that need it #11859 (@LysandreJik)
fixed a small typo in the CONTRIBUTING doc #11856 (@stsuchi)
typo #11858 (@WrRan)
Add option to log only once in multinode training #11819 (@sgugger)
[Wav2Vec2] SpecAugment Fast #11764 (@patrickvonplaten)
[lm examples] fix overflow in perplexity calc #11855 (@stas00)
[Examples] create model with custom config on the fly #11798 (@stas00)
[Wav2Vec2ForCTC] example typo fixed #11878 (@madprogramer)
[AutomaticSpeechRecognitionPipeline] Ensure input tensors are on device #11874 (@francescorubbo)
Fix usage of head masks by TF encoder-decoder models' generate() function #11775 (@stancld)
Correcting comments in T5Stack to reflect correct tuple order #11330 (@talkhaldi)
[Flax] Allow dataclasses to be jitted #11886 (@patrickvonplaten)
changing find_batch_size to work with tokenizer outputs #11890 (@joerenner)
Link official Cloud TPU JAX docs #11892 (@avital)
Flax Generate #11777 (@patrickvonplaten)
Update deepspeed config to reflect hyperparameter search parameters #11896 (@Mindful)
Adding new argument max_new_tokens for generate. #11476 (@Narsil)
Added Sequence Classification class in GPTNeo #11906 (@bhadreshpsavani)
[Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 #11918 (@jayendra13)
Test optuna and ray #11924 (@LysandreJik)
Use self.assertEqual instead of assert in deberta v2 test. #11935 (@PhilipMay)
Remove redundant nn.log_softmax in run_flax_glue.py #11920 (@n2cholas)
Add MT5ForConditionalGeneration as supported arch. to summarization README #11961 (@PhilipMay)
Add FlaxCLIP #11883 (@patil-suraj)
RAG-2nd2end-revamp #11893 (@shamanez)
modify qa-trainer #11872 (@zhangfanTJU)
get_ordinal(local=True) replaced with get_local_ordinal() in training_args.py #11922 (@BassaniRiccardo)
reinitialize wandb config for each hyperparameter search run #11945 (@Mindful)
Add regression tests for slow sentencepiece tokenizers. #11737 (@PhilipMay)
Authorize args when instantiating an AutoModel #11956 (@LysandreJik)
Neptune.ai integration #11937 (@vbyno)
[...

Assets 2

20 May 14:57

sgugger

v4.6.1

fb27b27

v4.6.1: Patch release

Fix regression in models for sequence classification used for regression tasks #11785
Fix checkpoint deletion when load_bert_model_at_end = True #11748
Fix evaluation in question answering examples #11746
Fix release utils #11784

Assets 2

12 May 15:07

LysandreJik

v4.6.0

64e7856

v4.6.0: ViT, DeiT, CLIP, LUKE, BigBirdPegasus, MegatronBERT

Transformers aren't just for text - they can handle a huge range of input types, and there's been a flurry of papers and new models in the last few months applying them to vision tasks that had traditionally been dominated by convolutional networks. With this release, we're delighted to announce that several state-of-the-art pretrained vision and multimodal text+vision transformer models are now accessible in the huggingface/transformers repo. Give them a try!

ViT (@NielsRogge)

Two new models are released as part of the ViT implementation: ViTModel and ViTForImageClassification, in PyTorch.

ViT is an image transformer-based model obtaining state-of-the-art results on image classification tasks. It was the first paper that successfully trained a Transformer encoder on ImageNet, attaining very good results compared to familiar convolutional architectures.

The Vision Transformer (ViT) model was proposed in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=vit

DeiT (@NielsRogge)

Three new models are released as part of the DeiT implementation: DeiTModel, DeiTForImageClassification and DeiTForImageClassificationWithTeacher, in PyTorch.

DeiT is an image transformer model similar to the ViT model. DeiT (data-efficient image transformers) models are more efficiently trained transformers for image classification, requiring far less data and far less computing resources compared to the original ViT models.

The DeiT model was proposed in Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deit

Add DeiT (PyTorch) #11056 (@NielsRogge)

CLIP (@patil-suraj)

Three new models are released as part of the CLIP implementation: CLIPModel, CLIPVisionModel and CLIPTextModel, in PyTorch.

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3.

The CLIP model was proposed in Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=clip

CLIP #11445 (@patil-suraj)

BigBirdPegasus (@vasudevgupta7)

BigBird is a sparse-attention-based transformer that extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it has been shown that applying sparse, global, and random attention approximates full attention while being computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context, BigBird has shown improved performance on various long document NLP tasks, such as question answering and summarization, compared to BERT or RoBERTa.

The BigBird model was proposed in Big Bird: Transformers for Longer Sequences by Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others.

Add BigBirdPegasus #10991 (@vasudevgupta7)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=bigbird_pegasus

LUKE (@NielsRogge, @ikuyamada)

LUKE is based on RoBERTa and adds entity embeddings as well as an entity-aware self-attention mechanism, which helps improve performance on various downstream tasks involving reasoning about entities such as named entity recognition, extractive and cloze-style question answering, entity typing, and relation classification.

The LUKE model was proposed in LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto.

Add LUKE #11223 (@NielsRogge, @ikuyamada)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=luke

Megatron (@jdemouth)

The MegatronBERT model is added to the library, giving access to the 345m variants.

It is implemented comes with nine different models: MegatronBertModel, MegatronBertForMaskedLM, MegatronBertForCausalLM, MegatronBertForNextSentencePrediction, MegatronBertForPreTraining, MegatronBertForSequenceClassification, MegatronBertForMultipleChoice, MegatronBertForTokenClassification, MegatronBertForQuestionAnswering, in PyTorch.

The MegatronBERT model was proposed in Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.

Add nvidia megatron models #10911 (@jdemouth)

Hub integration in Transformers

The Hugging Face Hub integrates better within transformers, through two new added features:

Models, configurations and tokenizers now have a push_to_hub method to automatically push their state to the hub.
The Trainer can now automatically push its underlying model, configuration and tokenizer in a similar fashion. Additionally, it is able to create a draft of model card on the fly with the training hyperparameters and evaluation results.
Auto modelcard #11599 (@sgugger)
Trainer push to hub #11328 (@sgugger)

DeepSpeed ZeRO Stage 3 & ZeRO-Infinity

The Trainer now integrates two additional stages of ZeRO: ZeRO stage 3 for parameter partitioning, and ZeRO Infinity which extends CPU Offload with NVMe Offload.

[DeepSpeed] ZeRO Stage 3 #10753 (@stas00) release notes
[Deepspeed] ZeRO-Infinity integration plus config revamp #11418 (@stas00) release notes
PLease read both release notes for configuration file changes

Flax

Flax support is getting more robust, with model code stabilizing and new models being added to the library.

[FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py #11470 (@patrickvonplaten)
[Flax] Add Electra models #11426 (@CoderPat)
Adds Flax BERT finetuning example on GLUE #11564 (@marcvanzee)

TensorFlow

We welcome @Rocketknight1 as a TensorFlow contributor. This version includes a brand new TensorFlow example based on Keras, which will be followed by examples covering most tasks.
Additionally, more TensorFlow setups are covered by adding support for AMD-based GPUs and M1 Macs.

Merge new TF example script #11360 (@Rocketknight1)
Update TF text classification example #11496 (@Rocketknight1)
run_text_classification.py fix #11660 (@Rocketknight1)
Accept tensorflow-rocm package when checking TF availability #11595 (@mvsjober)
Add MacOS TF version #11674 (@jplu)

Pipelines

Two new pipelines are added:

Adding AutomaticSpeechRecognitionPipeline. #11337 (@Narsil)
Add the ImageClassificationPipeline #11598 (@LysandreJik)

Notebooks

[Community notebooks] Add Wav2Vec notebook for creating captions for YT Clips #11142 (@Muennighoff)
add bigbird-pegasus evaluation notebook #11654 (@vasudevgupta7)
Vit notebooks + vit/deit fixes #11309 (@NielsRogge)

General improvements and bugfixes

[doc] gpt-neo #11098 (@stas00)
Auto feature extractor #11097 (@sgugger)
accelerate question answering examples with no trainer #11091 (@theainerd)
dead link fixed #11103 (@cronoik)
GPTNeo: handle padded wte (#11078) #11079 (@leogao2)
fix: The 'warn' method is deprecated #11105 (@stas00)
[examples] fix white space #11099 (@stas00)
Dummies multi backend #11100 (@sgugger)
Some styling of the training table in Notebooks #11118 (@sgugger)
Adds a note to resize the token embedding matrix when adding special … #11120 (@LysandreJik)
[BigBird] fix bigbird slow tests #11109 (@vasudevgupta7)
[versions] handle version requirement ranges #11110 (@stas00)
Adds use_auth_token with pipelines #11123 (@philschmid)
Fix and refactor check_repo #11127 (@sgugger)
Fix typing error in Trainer class (prediction_step) #11138 (@jannisborn)
Typo fix of the name of BertLMHeadModel in BERT doc #11133 (@forest1988)
[run_clm] clarify why we get the tokenizer warning on long input #11145 (@stas00)
[trainer] solve "scheduler before optimizer step" warning #11144 (@stas00)
Add fairscale and deepspeed back to the CI #11147 (@LysandreJik)
Updates SageMaker docs for updating DLCs #11140 (@philschmid)
Don't duplicate logs in TensorBoard and handle --use_env #11141 (@sgugger)
Run mlm pad to multiple for fp16 #11128 (@ak314)
[tests] relocate core integration tests #11146 (@stas00)
[setup] extras[docs] must include 'all' #11148 (@stas00)
Add support for multiple models for one config in auto classes #11150 (@sgugger)
[setup] make fairscale a...

Assets 2

13 Apr 15:25

sgugger

v4.5.1

4bae96e

v4.5.1: Patch release

Fix pipeline when used with private models (#11123)
Fix loading an architecture in an other (#11207)

Assets 2

06 Apr 16:50

LysandreJik

v4.5.0

4906a29

v4.5.0: BigBird, GPT Neo, Examples, Flax support

BigBird (@vasudevgupta7)

Seven new models are released as part of the BigBird implementation: BigBirdModel, BigBirdForPreTraining, BigBirdForMaskedLM, BigBirdForCausalLM, BigBirdForSequenceClassification, BigBirdForMultipleChoice, BigBirdForQuestionAnswering in PyTorch.

BigBird is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence.

The BigBird model was proposed in Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.

It is released with an accompanying blog post: Understanding BigBird's Block Sparse Attention

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=big_bird

BigBird #10183 (@vasudevgupta7)
[BigBird] Fix big bird gpu test #10967 (@patrickvonplaten)
[Notebook] add BigBird trivia qa notebook #10995 (@patrickvonplaten)
[Docs] Add blog to BigBird docs #10997 (@patrickvonplaten)

GPT Neo (@patil-suraj)

Two new models are released as part of the GPT Neo implementation: GPTNeoModel, GPTNeoForCausalLM in PyTorch.

GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. EleutherAI's primary goal is to replicate a GPT⁠-⁠3 DaVinci-sized model and open-source it to the public.

The implementation within Transformers is a GPT2-like causal language model trained on the Pile dataset.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gpt_neo

GPT Neo #10848 (@patil-suraj)
GPT Neo few fixes #10968 (@patil-suraj)
GPT Neo configuration needs to be set to use GPT2 tokenizer #10992 (@LysandreJik)
[GPT Neo] fix example in config #10993 (@patil-suraj)
GPT Neo cleanup #10985 (@patil-suraj )

Examples

Features have been added to some examples, and additional examples have been added.

Raw training loop examples

Based on the accelerate library, examples completely exposing the training loop are now part of the library. For easy customization if you want to try a new research idea!

Expand a bit the presentation of examples #10799 (@sgugger)
Add examples/multiple-choice/run_swag_no_trainer.py #10934 (@stancld)
Update the example template for a no Trainer option #10865 (@sgugger)
Add examples/run_ner_no_trainer.py #10902 (@stancld)
Add examples/language_modeling/run_mlm_no_trainer.py #11001 (@hemildesai)
Add examples/language_modeling/run_clm_no_trainer.py #11026 (@hemildesai)

Standardize examples with Trainer

Thanks to the amazing contributions of @bhadreshpsavani, all examples with Trainer are now standardized and all support the predict stage and will return/save metrics in the same fashion.

[Example] Updating Question Answering examples for Predict Stage #10792 (@bhadreshpsavani)
[Examples] Added predict stage and Updated Example Template #10868 (@bhadreshpsavani)
[Example] Fixed finename for Saving null_odds in the evaluation stage in QA Examples #10939 (@bhadreshpsavani)
[trainer] Fixes Typo in Predict Method of Trainer #10861 (@bhadreshpsavani)

Trainer & SageMaker Model Parallelism

The Trainer now supports SageMaker model parallelism out of the box, the old SageMakerTrainer is deprecated as a consequence and will be removed in version 5.

Merge trainers #10975 (@sgugger)
added new notebook and merge of trainer #11015 (@philschmid)

FLAX

FLAX support has been widened to support all model heads of the BERT architecture, alongside a general conversion script for checkpoints in PyTorch to be used in FLAX.

Auto models now have a FLAX implementation.

[Flax] Add general conversion script #10809 (@patrickvonplaten)
[Flax] Add other BERT classes #10977 (@patrickvonplaten)
Refactor AutoModel classes and add Flax Auto classes #11027 (@sgugger)

General improvements and bugfixes

Patches the full import failure and adds a test #10750 (@LysandreJik)
Patches full import failure when sentencepiece is not installed #10752 (@LysandreJik)
[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed #10464 (@cli99)
Fix ProphetNet Flaky Test #10771 (@patrickvonplaten)
[doc] [testing] extend the pytest -k section with more examples #10761 (@stas00)
Wav2Vec2 - fix flaky test #10773 (@patrickvonplaten)
[DeepSpeed] simplify init #10762 (@stas00)
[DeepSpeed] improve checkpoint loading code plus tests #10760 (@stas00)
[trainer] make failure to find a resume checkpoint fatal + tests #10777 (@stas00)
[Issue template] need to update/extend who to tag #10728 (@stas00)
[examples] document resuming #10776 (@stas00)
Check copies blackify #10775 (@sgugger)
Smmp batch not divisible by microbatches fix #10778 (@mansimane)
Add support for detecting intel-tensorflow version #10781 (@mfuntowicz)
wav2vec2: support datasets other than LibriSpeech #10581 (@elgeish)
add run_common_voice script #10767 (@patil-suraj)
Fix bug in input check for LengthGroupSampler #10783 (@thominj)
[file_utils] do not gobble certain kinds of requests.ConnectionError #10235 (@julien-c)
from_pretrained: check that the pretrained model is for the right model architecture #10586 (@vimarshc)
[examples/seq2seq/README.md] fix t5 examples #10734 (@stas00)
Fix distributed evaluation #10795 (@sgugger)
Add XLSR-Wav2Vec2 Fine-Tuning README.md #10786 (@patrickvonplaten)
addressing vulnerability report in research project deps #10802 (@stas00)
fix backend tokenizer args override: key mismatch #10686 (@theo-m)
[XLSR-Wav2Vec2 Info doc] Add a couple of lines #10806 (@patrickvonplaten)
Add transformers id to hub requests #10811 (@philschmid)
wav2vec doc tweaks #10808 (@julien-c)
Sort init import #10801 (@sgugger)
[wav2vec sprint doc] add doc for Local machine #10828 (@patil-suraj)
Add new community notebook - wav2vec2 with GPT #10794 (@voidful)
[Wav2Vec2] Small improvements for wav2vec2 info script #10829 (@patrickvonplaten)
[Wav2Vec2] Small tab fix #10846 (@patrickvonplaten)
Fix: typo in FINE_TUNE_XLSR_WAV2VEC2.md #10849 (@qqhann)
Bump jinja2 from 2.11.2 to 2.11.3 in /examples/research_projects/lxmert #10818 (@dependabot[bot])
[vulnerability] in example deps fix #10817 (@stas00)
Correct AutoConfig call docstrings #10822 (@Sebelino)
[makefile] autogenerate target #10814 (@stas00)
Fix on_step_begin and on_step_end Callback Sequencing #10839 (@siddk)
feat(wandb): logging and configuration improvements #10826 (@borisdayma)
Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases #10823 (@ruanchaves)
Use DataCollatorForSeq2Seq in run_summarization in all cases #10856 (@elsanns)
[Generate] Add save mode logits processor to remove nans and infs if necessary #10769 (@patrickvonplaten)
Make convert_to_onnx runable as script again #10857 (@sgugger)
[trainer] fix nan in full-fp16 label_smoothing eval #10815 (@stas00)
Fix p_mask cls token masking in question-answering pipeline #10863 (@mmaslankowska-neurosys)
Amazon SageMaker Documentation #10867 (@philschmid)
[file_utils] import refactor #10859 (@stas00)
Fixed confusing order of args in generate() docstring #10862 (@RafaelWO)
Sm trainer smp init fix #10870 (@philschmid)
Fix test_trainer_distributed #10875 (@sgugger)
Add new notebook links in the docs #10876 (@sgugger)
error type of tokenizer in init definition #10879 (@ZhengZixiang)
[Community notebooks] Add notebook for fine-tuning Bart with Trainer in two langs #10883 (@elsanns)
Fix overflowing bad word ids #10889 (@LysandreJik)
Remove version warning in pretrained BART models #10890 (@sgugger)
Update Training Arguments Documentation: ignore_skip_data -> ignore_data_skip #10891 (@siddk)
run_glue_no_trainer: datasets -> raw_datasets #10898 (@jethrokuan)
updates sagemaker documentation #10899 (@philschmid)
Fix comment in modeling_t5.py #10886 (@lexhuismans)
Rename NLP library to Datasets library #10920 (@tomy0000000)
[vulnerability] fix dependency #10914 (@stas00)
Add ImageFeatureExtractionMixin #10905 (@sgugger)
Return global attentions (see #7514) #10906 (@gui11aume)
Updated colab links in readme of examples #10932 (@WybeKoper)
Fix initializing BertJapaneseTokenizer with AutoTokenizers #10936 (@singletongue)
Instantiate model only once in pipeline #10888 (@sgugger)
Use pre-computed lengths, if available, when grouping by length #10953 (@pcuenca)
[trainer metrics] fix cpu mem metrics; reformat runtime metric #10937 (@stas00)
[vulnerability] dep fix #10954 (@stas00)
Fixes in the templates #10951 (@sgugger)
Sagemaker test #10925 (@philschmid)
Fix summarization notebook link #10959 (@philschmid)
improved sagemaker documentation for git_config and examples #10966 (@philschmid)
Fixed a bug where the pipeline.framework would actually contain a fully qualified model. #10970 (@Narsil)
added py7zr #10971 (@philschmid)
fix md file to avoid evaluation crash #10962 (@ydshieh)
Fixed some typos and removed legacy url #10989 (@WybeKoper)
Sagemaker test fix #10987 (@philschmid)
Fix the checkpoint for I-BERT #10994 (@LysandreJik)
Add more metadata to the user agent #10972 (@sgugger)
Enforce string-formatting with f-strings #10980 (@sgugger)
In the group by length documentation length is misspelled as legnth #11000 (@JohnnyC08)
Fix Adafactor documentation (recommend correct settings) #10526 (@jsrozner...

Assets 2

18 Mar 19:17

sgugger

v4.4.2

9f43a42

Patch release V4.4.2

Add support for detecting intel-tensorflow version
Fix distributed evaluation on SageMaker with distributed evaluation

Assets 2

Releases: huggingface/transformers

v4.9.0: TensorFlow examples, CANINE, tokenizer training, ONNX rework

v4.9.0: TensorFlow examples, CANINE, tokenizer training, ONNX rework

ONNX rework

CANINE model

Tokenizer training

TensorFlow examples

TensorFlow implementations

Breaking changes

General improvements and bugfixes

Patch release: v4.8.2

Patch release: v4.8.2

v4.8.1: Patch release

v4.8.1: Patch release

v4.8.0 Integration with the Hub and Flax/JAX support

v4.8.0 Integration with the Hub and Flax/JAX support

Integration with the Hub

Trainer Hub integration

Visualizing Training metrics on huggingface.co (based on Tensorboard)

Model card generation

Model, tokenizer and configurations

Flax/JAX support

General improvements and bug fixes