Releases: huggingface/transformers
v4.4.0: S2T, M2M100, I-BERT, mBART-50, DeBERTa-v2, XLSR-Wav2Vec2
v4.4.0: S2T, M2M100, I-BERT, mBART-50, DeBERTa-v2, XLSR-Wav2Vec2
SpeechToText
Two new models are released as part of the S2T implementation: Speech2TextModel
and Speech2TextForConditionalGeneration
, in PyTorch.
Speech2Text is a speech model that accepts a float tensor of log-mel filter-bank features extracted from the speech signal. It’s a transformer-based seq2seq model, so the transcripts/translations are generated autoregressively.
The Speech2Text model was proposed in fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text
- Speech2TextTransformer #10175 (@patil-suraj)
M2M100
Two new models are released as part of the M2M100 implementation: M2M100Model
and M2M100ForConditionalGeneration
, in PyTorch.
M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks.
The M2M100 model was proposed in Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=m2m_100
- Add m2m100 #10236 (@patil-suraj)
I-BERT
Six new models are released as part of the I-BERT implementation: IBertModel
, IBertForMaskedLM
, IBertForSequenceClassification
, IBertForMultipleChoice
, IBertForTokenClassification
and IBertForQuestionAnswering
, in PyTorch.
I-BERT is a quantized version of RoBERTa running inference up to four times faster.
The I-BERT framework in PyTorch allows to identify the best parameters for quantization. Once the model is exported in a framework that supports int8 execution (such as TensorRT), a speedup of up to 4x is visible, with no loss in performance thanks to the parameter search.
The I-BERT model was proposed in I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=ibert
- I-BERT model support #10153 (@kssteven418)
- [IBert] Correct link to paper #10445 (@patrickvonplaten)
- Add I-BERT to README #10462 (@LysandreJik)
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text
mBART-50
MBart-50 is created using the original mbart-large-cc25 checkpoint by extending its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50 languages.
The MBart model was presented in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=mbart-50
- Add mBART-50 #10154 (@patil-suraj)
DeBERTa-v2
Fixe new models are released as part of the DeBERTa-v2 implementation: DebertaV2Model
, DebertaV2ForMaskedLM
, DebertaV2ForSequenceClassification
, DeberaV2ForTokenClassification
and DebertaV2ForQuestionAnswering
, in PyTorch.
The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deberta-v2
- Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… #10018 (@BigBird01)
- DeBERTa-v2 fixes #10328 (@LysandreJik)
Wav2Vec2
XLSR-Wav2Vec2
The XLSR-Wav2Vec2 model was proposed in Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
The checkpoint corresponding to that model is added to the model hub: facebook/
wav2vec2-large-xlsr-53
- [XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models #10648 (@patrickvonplaten)
Training script
A fine-tuning script showcasing how the Wav2Vec2 model can be trained has been added.
- Add Fine-Tuning for Wav2Vec2 #10145 (@patrickvonplaten)
Further improvements
The Wav2Vec2 architecture becomes more stable as several changes are done to its architecture. This introduces feature extractors and feature processors as the pre-processing aspect of multi-modal speech models.
- Deprecate Wav2Vec2ForMaskedLM and add Wav2Vec2ForCTC #10089 (@patrickvonplaten)
- Fix example in Wav2Vec2 documentation #10096 (@abhishekkrthakur)
- [Wav2Vec2] Remove unused config #10457 (@patrickvonplaten)
- [Wav2Vec2FeatureExtractor] smal fixes #10455 (@patil-suraj)
- [Wav2Vec2] Improve Tokenizer & Model for batched inference #10117 (@patrickvonplaten)
- [PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer #10324 (@patrickvonplaten)
- [Wav2Vec2 Example Script] Typo #10547 (@patrickvonplaten)
- [Wav2Vec2] Make wav2vec2 test deterministic #10714 (@patrickvonplaten)
- [Wav2Vec2] Fix documentation inaccuracy #10694 (@MikeG112)
AMP & XLA Support for TensorFlow models
Most of the TensorFlow models are now compatible with automatic mixed precision and have XLA support.
- Add AMP for TF Albert #10141 (@jplu)
- Unlock XLA test for TF ConvBert #10207 (@jplu)
- Making TF BART-like models XLA and AMP compliant #10191 (@jplu)
- Making TF XLM-like models XLA and AMP compliant #10211 (@jplu)
- Make TF CTRL compliant with XLA and AMP #10209 (@jplu)
- Making TF GPT2 compliant with XLA and AMP #10230 (@jplu)
- Making TF Funnel compliant with AMP #10216 (@jplu)
- Making TF Lxmert model compliant with AMP #10257 (@jplu)
- Making TF MobileBert model compliant with AMP #10259 (@jplu)
- Making TF MPNet model compliant with XLA #10260 (@jplu)
- Making TF T5 model compliant with AMP and XLA #10262 (@jplu)
- Making TF TransfoXL model compliant with AMP #10264 (@jplu)
- Making TF OpenAI GPT model compliant with AMP and XLA #10261 (@jplu)
- Rework the AMP for TF XLNet #10274 (@jplu)
- Making TF Longformer-like models compliant with AMP #10233 (@jplu)
SageMaker Trainer for model parallelism
We are rolling out experimental support for model parallelism on SageMaker with a new SageMakerTrainer
that can be used in place of the regular Trainer
. This is a temporary class that will be removed in a future version, the end goal is to have Trainer
support this feature out of the box.
- Add SageMakerTrainer for model paralellism #10122 (@sgugger)
- Extend trainer logging for sm #10633 (@philschmid)
- Sagemaker Model Parallel tensoboard writing fix #10403 (@mansimane)
- Multiple fixes in SageMakerTrainer #10687 (@sgugger)
- Add DistributedSamplerWithLoop #10746 (@sgugger)
General improvements and bugfixes
- [trainer] deepspeed bug fixes and tests #10039 (@stas00)
- Removing run_pl_glue.py from text classification docs, include run_xnli.py & run_tf_text_classification.py #10066 (@cbjuan)
- remove token_type_ids from TokenizerBertGeneration output #10070 (@sadakmed)
- [deepspeed tests] transition to new tests dir #10080 (@stas00)
- Added integration tests for Pytorch implementation of the ELECTRA model #10073 (@spatil6)
- Fix naming in TF MobileBERT #10095 (@jplu)
- [examples/s2s] add test set predictions #10085 (@patil-suraj)
- Logging propagation #10092 (@LysandreJik)
- Fix some edge cases in report_to and add deprecation warnings #10100 (@sgugger)
- Add head_mask and decoder_head_mask to TF LED #9988 (@stancld)
- Replace strided slice with tf.expand_dims #10078 (@jplu)
- Fix Faiss Import #10103 (@patrickvonplaten)
- [RAG] fix generate #10094 (@patil-suraj)
- Fix TFConvBertModelIntegrationTest::test_inference_masked_lm Test #10104 (@abhishekkrthakur)
- doc: update W&B related doc #10086 (@borisdayma)
- Remove speed metrics from default compute objective [WIP] #10107 (@shiva-z)
- Fix tokenizers training in notebooks #10110 (@n1t0)
- [DeepSpeed docs] new information #9610 (@stas00)
- [CI] build docs faster #10115 (@stas00)
- [scheduled github CI] add deepspeed fairscale deps #10116 (@stas00)
- Line endings should be LF across repo and not CRLF #10119 (@LysandreJik)
- Fix TF LED/Longformer attentions computation #10007 (@jplu)
- remove adjust_logits_during_generation method #10087 (@patil-suraj)
- [DeepSpeed] restore memory for evaluation #10114 (@stas00)
- Update run_xnli.py to use Datasets library #9829 (@Qbiwan)
- Add new community notebook - Blenderbot #10126 (@lordtt13)
- [DeepSpeed in notebooks] Jupyter + Colab #10130 (@stas00)
- [examples/run_s2s] remove task_specific_params and update rouge computation #10133 (@patil-suraj)
- Fix typo in GPT2DoubleHeadsModel docs #10148 (@M-Salti)
- [hf_api] delete deprecated methods and tests #10159 (@julien-c)
- Revert propagation #10171 (@LysandreJik)
- Conversion from slow to fast for BPE spm vocabs contained an error. #10120 (@Narsil)
- Fix typo in comments #10157 (@mrm8488)
- Fix typo in comment #10156 (@mrm8488)
- [Doc] Fix version control in internal pages #10124 (@sgugger)
- [t5 tokenizer] add info logs #9897 (@stas00)
- Fix v2 model l...
v4.3.3: Patch release
This patch fixes an issue with the conversion for ConvBERT models: #10314.
V4.3.2: Patch release
v4.3.1: Patch release
This patch release modifies the API of the Wav2Vec2
model: the Wav2Vec2ForCTC
was added as a replacement of Wav2Vec2ForMaskedLM
. Wav2Vec2ForMaskedLM
is kept for backwards compatibility but is deprecated.
- Deprecate Wav2Vec2ForMaskedLM and add Wav2Vec2ForCTC #10089 (@patrickvonplaten)
v4.3.0: Wav2Vec2, ConvBERT, BORT, Amazon SageMaker
Wav2Vec2 from facebook (@patrickvonplaten)
Two new models are released as part of the Wav2Vec2 implementation: Wav2Vec2Model
and Wav2Vec2ForMaskedLM
, in PyTorch.
Wav2Vec2 is a multi-modal model, combining speech and text. It's the first multi-modal model of its kind we welcome in Transformers.
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=wav2vec2
Available notebooks:
Contributions:
- Wav2Vec2 #9659 (@patrickvonplaten)
Future Additions
- Enable fine-tuning and pretraining for Wav2Vec2
- Add example script with dependency to wav2letter/flashlight
- Add Encoder-Decoder Wav2Vec2 model
ConvBERT
The ConvBERT model was proposed in ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
Six new models are released as part of the ConvBERT implementation: ConvBertModel
, ConvBertForMaskedLM
, ConvBertForSequenceClassification
, ConvBertForTokenClassification
, ConvBertForQuestionAnswering
and ConvBertForMultipleChoice
. These models are available both in PyTorch and TensorFlow.
Contributions:
- ConvBERT Model #9717 (@abhishekkrthakur)
- ConvBERT: minor fixes for conversion script #9937 (@stefan-it)
- Fix GroupedLinearLayer in TF ConvBERT #9972 (@abhishekkrthakur)
BORT
The BORT model was proposed in Optimal Subarchitecture Extraction for BERT by Amazon's Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the authors refer to as “Bort”.
The BORT model can be loaded directly in the BERT architecture, therefore all BERT model heads are available for BORT.
Contributions:
- ADD BORT #9813 (@stefan-it)
Trainer now supports Amazon SageMaker’s data parallel library (@sgugger)
When executing a script with Trainer
using Amazon SageMaker and enabling SageMaker's data parallelism library, Trainer
will automatically use the smdistributed
library. All maintained examples have been tested with this functionality. Here is an overview of SageMaker data parallelism library.
Community page
A new Community Page has been added to the docs. These contain all the notebooks contributed by the community, as well as some community projects built around Transformers. Feel free to open a PR if you want your project to be showcased!
Additional model architectures
DeBERTa now has more model heads available.
- Add DeBERTa head models #9691 (@NielsRogge)
BART, mBART, Marian, Pegasus and Blenderbot now have decoder-only model architectures. They can therefore be used in decoder-only settings.
Breaking changes
None.
General improvements and bugfixes
- Fix Trainer with a parallel model #9578 (@sgugger)
- Switch metrics in run_ner to datasets #9567 (@sgugger)
- Compliancy with tf-nightly #9570 (@jplu)
- Make logs TF compliant #9565 (@jplu)
- [setup.py] note on how to get to transformers exact dependencies from shell #9553 (@stas00)
- Fix conda build #9589 (@LysandreJik)
- BatchEncoding.to with device with tests #9584 (@LysandreJik)
- Gradient accumulation for TFTrainer #9585 (@kiyoungkim1)
- Upstream (and rename) sortish sampler #9574 (@sgugger)
- [deepspeed doc] install issues + 1-gpu deployment #9582 (@stas00)
- [TF Led] Fix wrong decoder attention mask behavior #9601 (@patrickvonplaten)
- Remove unused token_type_ids in MPNet #9564 (@jplu)
- Ignore lm_head decoder bias warning #9615 (@LysandreJik)
- [deepspeed] --gradient_accumulation_steps fix #9622 (@stas00)
- Remove duplicated extras["retrieval"] #9621 (@n1t0)
- Fix: torch.utils.checkpoint.checkpoint attribute error. #9626 (@devrimcavusoglu)
- Add head_mask/decoder_head_mask for BART #9569 (@stancld)
- [Bart-like tests] Fix torch device for bart tests #9669 (@patrickvonplaten)
- Fix DPRReaderTokenizer's attention_mask #9663 (@mkserge)
- add mbart to automodel for masked lm #9673 (@patrickvonplaten)
- Fix imports in conversion scripts #9674 (@sgugger)
- Fix GPT conversion script #9676 (@sgugger)
- Fix old Seq2SeqTrainer #9675 (@sgugger)
- Update
past_key_values
in GPT-2 #9596 (@forest1988) - Update integrations.py #9652 (@max-yue)
- Fix TF Flaubert and XLM #9661 (@jplu)
- New run_seq2seq script #9605 (@sgugger)
- Add separated decoder_head_mask for T5 Models #9634 (@stancld)
- Fix model templates and use less than 119 chars #9684 (@sgugger)
- Restrain tokenizer.model_max_length default #9681 (@sgugger)
- Speed up RepetitionPenaltyLogitsProcessor (pytorch) #9600 (@LSinev)
- Use datasets squad_v2 metric in run_qa #9677 (@sgugger)
- Fix label datatype in TF Trainer #9616 (@jplu)
- New TF embeddings (cleaner and faster) #9418 (@jplu)
- Fix TF template #9697 (@jplu)
- Add t5 convert to transformers-cli #9654 (@acul3)
- Fix Funnel Transformer conversion script #9683 (@sgugger)
- Add notebook #9696 (@NielsRogge)
- Fix Trainer and Args to mention AdamW, not Adam. #9685 (@gchhablani)
- [deepspeed] fix the backward for deepspeed #9705 (@stas00)
- Fix WAND_DISABLED test #9703 (@sgugger)
- [trainer] no --deepspeed and --sharded_ddp together #9712 (@stas00)
- fix typo #9708 (@Muennighoff)
- Temporarily deactivate TPU tests while we work on fixing them #9720 (@LysandreJik)
- Allow text generation for ProphetNetForCausalLM #9707 (@guillaume-be)
- [LED] Reduce Slow Test required GPU RAM from 16GB to 8GB #9723 (@patrickvonplaten)
- [T5] Fix T5 model parallel tests #9721 (@patrickvonplaten)
- fix T5 head mask in model_parallel #9726 (@patil-suraj)
- Fix mixed precision in TF models #9163 (@jplu)
- Changing model default for TableQuestionAnsweringPipeline. #9729 (@Narsil)
- Fix TF s2s models #9478 (@jplu)
- Fix memory regression in Seq2Seq example #9713 (@sgugger)
- examples: fix XNLI url #9741 (@stefan-it)
- Fix some TF slow tests #9728 (@jplu)
- Fixes to run_seq2seq and instructions #9734 (@sgugger)
- Add
report_to
training arguments to control the integrations used #9735 (@sgugger) - Fix a TF test #9755 (@jplu)
- [fsmt] token_type_ids isn't used #9736 (@stas00)
- Fix broken [Open in Colab] links (#9688) #9761 (@wilcoln)
- Fix TFTrainer prediction output #9662 (@janinaj)
- Use object store to pass trainer object to Ray Tune (makes it work with large models) #9749 (@krfricke)
- Fix a typo in
Trainer.hyperparameter_search
docstring #9762 (@sorami) - [fsmt] onnx triu workaround #9738 (@stas00)
- Fix model parallel definition in superclass #9787 (@LysandreJik)
- Auto-resume training from checkpoint #9776 (@sgugger)
- [PR/Issue templates] normalize, group, sort + add myself for deepspeed #9706 (@stas00)
- [Flaky Generation Tests] Make sure that no early stopping is happening for beam search #9794 (@patrickvonplaten)
- Fix broken links in the converting tf ckpt document #9791 (@forest1988)
- Add head_mask/decoder_head_mask for TF BART models #9639 (@stancld)
- Adding
skip_special_tokens=True
to FillMaskPipeline #9783 (@Narsil) - Improve pytorch examples for fp16 #9796 (@ak314)
- Smdistributed trainer #9798 (@sgugger)
- RagTokenForGeneration: Fixed parameter name for logits_processor #9790 (@michaelrglass)
- Fix fine-tuning translation scripts #9809 (@mbiesialska)
- Allow RAG to output decoder cross-attentions #9789 (@dblakely)
- Commit the last step on world_process_zero in WandbCallback #9805 (@tristandeleu)
- Fix a bug in run_glue.py (#9812) #9815 (@forest1988)
- [LedFastTokenizer] Correct missing None statement #9828 (@patrickvonplaten)
- [Setup.py] update jaxlib #9831 (@patrickvonplaten)
- Add a test for TF mixed precision #9806 (@jplu)
- Setup logging with a stdout handler #9816 (@sgugger)
- Fix auto-resume training from checkpoint #9822 (@jncasey)
- [MT5 Import init] Fix typo #9830 (@patrickvonplaten)
- Adding a test to prevent late failure in the Table question answering pipeline. #9808 (@Narsil)
- Remove a TF usage warning and rework the documentation #9756 (@jplu)
- Delete a needless duplicate condition #9826 (@tomohideshibata)
- Clean TF Bert #9788 (@jplu)
- Add a flag for find_unused_parameters #9820 (@sgugger)
- Fix TF template #9840 (@jplu)
- Fix model templates #9842 (@LysandreJik)
- Add tpu_zone and gcp_project in training_args_tf.py #9825 (@kiyoungkim1)
- Labeled pull requests #9849 (@LysandreJik)
- [GA forks] Test on every push #9851 (@LysandreJik)
- When resuming training from checkpoint, Trainer loads model #9818 (@sgugger)
- Allow --arg Value for booleans in HfArgumentParser #9823 (@sgugger)
- [traner] fix --lr_scheduler_type choices #9800 (@stas00)
- Pin memory in Trainer by default #9857 (@abhishekkrthakur)
- Partial local tokenizer load #9807 (@LysandreJik)
- Remove submodule #9868 (@LysandreJik)
- Fixing flaky conversational test + flag it as a pipeline test. #9837 (@Narsil)
- Fix computation of attention_probs when head_mask is provided. #9853 (@mfuntowicz)
- Deprecate model_path in Trainer.train #9854 (@sgugger)
- Remove redundant
test_head_masking = True
flags in test files #9858 (@stancld) - [docs] expand install instructions #9817 (@stas00)
- on_log event should occur after the current log is written...
v4.3.0.rc1: Wav2Vec2, ConvBERT, BORT, Amazon SageMaker
Wav2Vec2 from facebook (@patrickvonplaten)
Two new models are released as part of the Wav2Vec2 implementation: Wav2Vec2Model
and Wav2Vec2ForMaskedLM
, in PyTorch.
Wav2Vec2 is a multi-modal model, combining speech and text. It's the first multi-modal model of its kind we welcome in Transformers.
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=wav2vec2
Available notebooks:
Contributions:
- Wav2Vec2 #9659 (@patrickvonplaten)
Future Additions
- Enable fine-tuning and pretraining for Wav2Vec2
- Add example script with dependency to wav2letter/flashlight
- Add Encoder-Decoder Wav2Vec2 model
ConvBERT
The ConvBERT model was proposed in ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
Six new models are released as part of the ConvBERT implementation: ConvBertModel
, ConvBertForMaskedLM
, ConvBertForSequenceClassification
, ConvBertForTokenClassification
, ConvBertForQuestionAnswering
and ConvBertForMultipleChoice
. These models are available both in PyTorch and TensorFlow.
Contributions:
- ConvBERT Model #9717 (@abhishekkrthakur)
- ConvBERT: minor fixes for conversion script #9937 (@stefan-it)
- Fix GroupedLinearLayer in TF ConvBERT #9972 (@abhishekkrthakur)
BORT
The BORT model was proposed in Optimal Subarchitecture Extraction for BERT by Amazon's Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the authors refer to as “Bort”.
The BORT model can be loaded directly in the BERT architecture, therefore all BERT model heads are available for BORT.
Contributions:
- ADD BORT #9813 (@stefan-it)
Trainer now supports Amazon SageMaker’s data parallel library (@sgugger)
When executing a script with Trainer
using Amazon SageMaker and enabling SageMaker's data parallelism library, Trainer
will automatically use the smdistributed
library. All maintained examples have been tested with this functionality. Here is an overview of SageMaker data parallelism library.
Community page
A new Community Page has been added to the docs. These contain all the notebooks contributed by the community, as well as some community projects built around Transformers. Feel free to open a PR if you want your project to be showcased!
Additional model architectures
DeBERTa now has more model heads available.
- Add DeBERTa head models #9691 (@NielsRogge)
BART, mBART, Marian, Pegasus and Blenderbot now have decoder-only model architectures. They can therefore be used in decoder-only settings.
Breaking changes
None.
General improvements and bugfixes
- Fix Trainer with a parallel model #9578 (@sgugger)
- Switch metrics in run_ner to datasets #9567 (@sgugger)
- Compliancy with tf-nightly #9570 (@jplu)
- Make logs TF compliant #9565 (@jplu)
- [setup.py] note on how to get to transformers exact dependencies from shell #9553 (@stas00)
- Fix conda build #9589 (@LysandreJik)
- BatchEncoding.to with device with tests #9584 (@LysandreJik)
- Gradient accumulation for TFTrainer #9585 (@kiyoungkim1)
- Upstream (and rename) sortish sampler #9574 (@sgugger)
- [deepspeed doc] install issues + 1-gpu deployment #9582 (@stas00)
- [TF Led] Fix wrong decoder attention mask behavior #9601 (@patrickvonplaten)
- Remove unused token_type_ids in MPNet #9564 (@jplu)
- Ignore lm_head decoder bias warning #9615 (@LysandreJik)
- [deepspeed] --gradient_accumulation_steps fix #9622 (@stas00)
- Remove duplicated extras["retrieval"] #9621 (@n1t0)
- Fix: torch.utils.checkpoint.checkpoint attribute error. #9626 (@devrimcavusoglu)
- Add head_mask/decoder_head_mask for BART #9569 (@stancld)
- [Bart-like tests] Fix torch device for bart tests #9669 (@patrickvonplaten)
- Fix DPRReaderTokenizer's attention_mask #9663 (@mkserge)
- add mbart to automodel for masked lm #9673 (@patrickvonplaten)
- Fix imports in conversion scripts #9674 (@sgugger)
- Fix GPT conversion script #9676 (@sgugger)
- Fix old Seq2SeqTrainer #9675 (@sgugger)
- Update
past_key_values
in GPT-2 #9596 (@forest1988) - Update integrations.py #9652 (@max-yue)
- Fix TF Flaubert and XLM #9661 (@jplu)
- New run_seq2seq script #9605 (@sgugger)
- Add separated decoder_head_mask for T5 Models #9634 (@stancld)
- Fix model templates and use less than 119 chars #9684 (@sgugger)
- Restrain tokenizer.model_max_length default #9681 (@sgugger)
- Speed up RepetitionPenaltyLogitsProcessor (pytorch) #9600 (@LSinev)
- Use datasets squad_v2 metric in run_qa #9677 (@sgugger)
- Fix label datatype in TF Trainer #9616 (@jplu)
- New TF embeddings (cleaner and faster) #9418 (@jplu)
- Fix TF template #9697 (@jplu)
- Add t5 convert to transformers-cli #9654 (@acul3)
- Fix Funnel Transformer conversion script #9683 (@sgugger)
- Add notebook #9696 (@NielsRogge)
- Fix Trainer and Args to mention AdamW, not Adam. #9685 (@gchhablani)
- [deepspeed] fix the backward for deepspeed #9705 (@stas00)
- Fix WAND_DISABLED test #9703 (@sgugger)
- [trainer] no --deepspeed and --sharded_ddp together #9712 (@stas00)
- fix typo #9708 (@Muennighoff)
- Temporarily deactivate TPU tests while we work on fixing them #9720 (@LysandreJik)
- Allow text generation for ProphetNetForCausalLM #9707 (@guillaume-be)
- [LED] Reduce Slow Test required GPU RAM from 16GB to 8GB #9723 (@patrickvonplaten)
- [T5] Fix T5 model parallel tests #9721 (@patrickvonplaten)
- fix T5 head mask in model_parallel #9726 (@patil-suraj)
- Fix mixed precision in TF models #9163 (@jplu)
- Changing model default for TableQuestionAnsweringPipeline. #9729 (@Narsil)
- Fix TF s2s models #9478 (@jplu)
- Fix memory regression in Seq2Seq example #9713 (@sgugger)
- examples: fix XNLI url #9741 (@stefan-it)
- Fix some TF slow tests #9728 (@jplu)
- Fixes to run_seq2seq and instructions #9734 (@sgugger)
- Add
report_to
training arguments to control the integrations used #9735 (@sgugger) - Fix a TF test #9755 (@jplu)
- [fsmt] token_type_ids isn't used #9736 (@stas00)
- Fix broken [Open in Colab] links (#9688) #9761 (@wilcoln)
- Fix TFTrainer prediction output #9662 (@janinaj)
- Use object store to pass trainer object to Ray Tune (makes it work with large models) #9749 (@krfricke)
- Fix a typo in
Trainer.hyperparameter_search
docstring #9762 (@sorami) - [fsmt] onnx triu workaround #9738 (@stas00)
- Fix model parallel definition in superclass #9787 (@LysandreJik)
- Auto-resume training from checkpoint #9776 (@sgugger)
- [PR/Issue templates] normalize, group, sort + add myself for deepspeed #9706 (@stas00)
- [Flaky Generation Tests] Make sure that no early stopping is happening for beam search #9794 (@patrickvonplaten)
- Fix broken links in the converting tf ckpt document #9791 (@forest1988)
- Add head_mask/decoder_head_mask for TF BART models #9639 (@stancld)
- Adding
skip_special_tokens=True
to FillMaskPipeline #9783 (@Narsil) - Improve pytorch examples for fp16 #9796 (@ak314)
- Smdistributed trainer #9798 (@sgugger)
- RagTokenForGeneration: Fixed parameter name for logits_processor #9790 (@michaelrglass)
- Fix fine-tuning translation scripts #9809 (@mbiesialska)
- Allow RAG to output decoder cross-attentions #9789 (@dblakely)
- Commit the last step on world_process_zero in WandbCallback #9805 (@tristandeleu)
- Fix a bug in run_glue.py (#9812) #9815 (@forest1988)
- [LedFastTokenizer] Correct missing None statement #9828 (@patrickvonplaten)
- [Setup.py] update jaxlib #9831 (@patrickvonplaten)
- Add a test for TF mixed precision #9806 (@jplu)
- Setup logging with a stdout handler #9816 (@sgugger)
- Fix auto-resume training from checkpoint #9822 (@jncasey)
- [MT5 Import init] Fix typo #9830 (@patrickvonplaten)
- Adding a test to prevent late failure in the Table question answering pipeline. #9808 (@Narsil)
- Remove a TF usage warning and rework the documentation #9756 (@jplu)
- Delete a needless duplicate condition #9826 (@tomohideshibata)
- Clean TF Bert #9788 (@jplu)
- Add a flag for find_unused_parameters #9820 (@sgugger)
- Fix TF template #9840 (@jplu)
- Fix model templates #9842 (@LysandreJik)
- Add tpu_zone and gcp_project in training_args_tf.py #9825 (@kiyoungkim1)
- Labeled pull requests #9849 (@LysandreJik)
- [GA forks] Test on every push #9851 (@LysandreJik)
- When resuming training from checkpoint, Trainer loads model #9818 (@sgugger)
- Allow --arg Value for booleans in HfArgumentParser #9823 (@sgugger)
- [traner] fix --lr_scheduler_type choices #9800 (@stas00)
- Pin memory in Trainer by default #9857 (@abhishekkrthakur)
- Partial local tokenizer load #9807 (@LysandreJik)
- Remove submodule #9868 (@LysandreJik)
- Fixing flaky conversational test + flag it as a pipeline test. #9837 (@Narsil)
- Fix computation of attention_probs when head_mask is provided. #9853 (@mfuntowicz)
- Deprecate model_path in Trainer.train #9854 (@sgugger)
- Remove redundant
test_head_masking = True
flags in test files #9858 (@stancld) - [docs] expand install instructions #9817 (@stas00)
- on_log event should occur after the current log is written...
v4.2.2: Patch release
This patch contains two fixes:
- [TF Led] Fix wrong decoder attention mask behavior #9601 (@patrickvonplaten )
- Fix imports in conversion scripts #9674 (@sgugger)
v4.2.1 Patch release
v4.2.0: LED from AllenAI, Generation Scores, TensorFlow 2x speedup, faster import
v4.2.0: LED from AllenAI, encoder-decoder templates, fast imports
LED from AllenAI (@patrickvonplaten)
Four new models are released as part of the LED implementation: LEDModel
, LEDForConditionalGeneration
, LEDForSequenceClassification
, LEDForQuestionAnswering
, in PyTorch. The first two models have a TensorFlow version.
LED is the encoder-decoder variant of the Longformer model by allenai.
The LED model was proposed in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=led
Available notebooks:
- Evaluation: https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing
- Finetuning: https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing
Contributions:
- LED #9278 (@patrickvonplaten)
- [LED Test] fix common inputs pt for flaky pt-tf led test #9459 (@SBrandeis, @patrickvonplaten)
- [TF Led] Fix flaky TF Led test #9513 (@patrickvonplaten)
Generation Scores & other outputs (@patrickvonplaten)
The PyTorch generation function now allows to return:
scores
- the logits generated at each stepattentions
- all attention weights at each generation stephidden_states
- all hidden states at each generation step
by simply adding return_dict_in_generate
to the config or as an input to .generate()
Tweet:
Notebooks for a better explanation:
- https://discuss.huggingface.co/t/announcement-generationoutputs-scores-attentions-and-hidden-states-now-available-as-outputs-to-generate/3094/2
- https://discuss.huggingface.co/t/generation-probabilities-how-to-compute-probabilities-of-output-scores-for-gpt2/3175
PR:
- Add flags to return scores, hidden states and / or attention weights in GenerationMixin #9150 (@SBrandeis)
TensorFlow improvements
TensorFlow BERT-like model improvements (@jplu)
The TensorFlow version of the BERT-like models have been updated and are now twice as fast as the previous versions.
Better integration in TensorFlow Serving (@jplu)
This version introduces a new API for TensorFlow saved models, which can now be exported with model.save_pretrained("path", saved_model=True)
and easily loaded into a TensorFlow Serving environment.
DeepSpeed integration (@stas00)
Initial support for DeepSpeed to accelerate distributed training on several GPUs. This is an experimental feature that hasn't been fully tested yet, but early results are very encouraging (see this comment). Stay tuned for more details in the coming weeks!
Model templates (@patrickvonplaten)
The encoder-decoder version of the templates is now part of Transformers! Adding an encoder-decoder model is made very easy with this addition. More information can be found in the README.
- Model Templates for Seq2Seq #9251 (@patrickvonplaten)
- [Seq2Seq Templates] Add embedding scale to templates #9342 (@patrickvonplaten)
- [Seq2Seq Templates] Add forgotten imports to templates #9346 (@patrickvonplaten)
Faster import (@sgugger)
The initialization process has been changed to only import what is required. Therefore, when using only PyTorch models, TensorFlow will not be imported and vice-versa. In the best situations the import of a transformers model now takes only a few hundreds of milliseconds (~200ms) compared to more than a few seconds (~3s) in previous versions.
- Fast transformers import part 1 #9441 (@sgugger)
- Transformers fast import part 2 #9446 (@sgugger)
- Fast imports part 3 #9474 (@sgugger)
Documentation highlights (@Qbiwan, @NielsRogge)
Some models now have improved documentation. The LayoutLM
model has seen a general overhaul in its documentation thanks to @NielsRogge.
The tokenizer-only models Bertweet
, Herbert
and Phobert
now have their own documentation pages thanks to @Qbiwan.
- Improve LayoutLM #9476 (@NielsRogge)
- Improve documentation coverage for Bertweet #9379 (@Qbiwan)
- Improve documentation coverage for Herbert #9428 (@Qbiwan)
- Improve documentation coverage for Phobert #9427 (@Qbiwan)
Breaking changes
There are no breaking changes between the previous version and this one.
This will be the first version to require TensorFlow >= 2.3.
General improvements and bugfixes
- add tests for the new sharded ddp fairscale integration #9177 (@stas00)
- Added TF CTRL Sequence Classification #9151 (@spatil6)
- [trainer] apex fixes and tests #9180 (@stas00)
- Fix link to old NER fine-tuning script #9182 (@mrm8488)
- fixed not JSON serializable error in run_qa.py with fp16 #9186 (@WissamAntoun)
- [setup] correct transformers version format #9176 (@stas00)
- Fix link to old SQUAD fine-tuning script #9181 (@mrm8488)
- Add new run_swag example #9175 (@sgugger)
- Add timing inside Trainer #9196 (@sgugger)
- GPT-model attention heads pruning example #9189 (@altsoph)
- [t5 doc] typos #9199 (@stas00)
- [run_glue] add speed metrics #9198 (@stas00)
- Added TF TransfoXL Sequence Classification #9169 (@spatil6)
- [finetune trainer] better logging and help #9203 (@stas00)
- [RAG] Add Ray implementation for distributed retrieval #9197 (@amogkam)
- [T5] Fix warning for changed EncDec Attention Bias weight #9231 (@patrickvonplaten)
- Improve BERT-like models performance with better self attention #9124 (@jplu)
- Fix TF template #9234 (@jplu)
- Fix beam search generation for GPT2 and T5 on model parallelism #9219 (@TobiasNorlund)
- add base model classes to bart subclassed models #9230 (@patil-suraj)
- [MPNet] Add slow to fast tokenizer converter #9233 (@patrickvonplaten)
- Adding performer fine-tuning research exampke #9239 (@TevenLeScao)
- Update the README of the text classification example #9237 (@sgugger)
- [EncoderDecoder] Make tests more aggressive #9256 (@patrickvonplaten)
- Fix script that check objects are documented #9259 (@sgugger)
- Seq2seq trainer #9241 (@sgugger)
- Fix link to old language modeling script #9254 (@mrm8488)
- Fix link to bertabs/README.md #9255 (@mrm8488)
- Fix TF BART for saved model creation #9252 (@jplu)
- Add speed metrics to all example scripts + template #9260 (@sgugger)
- Revert renaming in finetune_trainer #9262 (@sgugger)
- Fix gpt2 document #9272 (@xu-song)
- Fix param error #9273 (@xu-song)
- [Seq2Seq Templates] Fix check_repo.py templates file #9277 (@patrickvonplaten)
- Minor documentation revisions from copyediting #9266 (@connorbrinton)
- Adapt to new name of
label_smoothing_factor
training arg #9282 (@sgugger) - Add caching mechanism to BERT, RoBERTa #9183 (@patil-suraj)
- [Templates] Adapt Bert #9284 (@patrickvonplaten)
- allow integer device for BatchEncoding #9271 (@jethrokuan)
- Fix typo in file_utils.py #9289 (@jungwhank)
- [bert_generation] enable cache by default #9296 (@patil-suraj)
- Proposed Fix : [RagSequenceForGeneration] generate "without" input_ids #9220 (@ratthachat)
- fix typo in modeling_encoder_decoder.py #9297 (@daniele-sartiano)
- Update tokenization_utils_base.py #9293 (@BramVanroy)
- [Bart doc] Fix outdated statement #9299 (@patrickvonplaten)
- add translation example #9303 (@vasudevgupta7)
- [GPT2] Correct gradient checkpointing #9308 (@patrickvonplaten)
- [Seq2SeqTrainer] Fix Typo #9320 (@patrickvonplaten)
- [Seq2Seq Templates] Correct some TF-serving errors and add gradient checkpointing to PT by default. #9334 (@patrickvonplaten)
- Fix TF T5 #9301 (@jplu)
- Fix TF TransfoXL #9302 (@jplu)
- [prophetnet] wrong import #9349 (@stas00)
- [apex.normalizations.FusedLayerNorm] torch.cuda.is_available() is redundant as apex handles that internally #9350 (@stas00)
- Make sure to use return dict for the encoder call inside RagTokenForGeneration #9363 (@dblakely)
- [Docs]
past_key_values
return a tuple of tuple as a default #9381 (@patrickvonplaten) - [docs] Fix TF base model examples: outputs.last_hidden_states -> state #9382 (@ck37)
- Fix typos in README and bugs in RAG example code for end-to-end evaluation and finetuning #9355 (@yoshitomo-matsubara)
- Simplify marian distillation script #9394 (@sshleifer)
- Add utility function for retrieving locally cached models #8836 (@cdpierse)
- Fix TF CTRL #9291 (@jplu)
- Put back LXMert example #9401 (@sgugger)
- Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/lxmert #9402 (@dependabot[bot])
- Fix TF Flaubert #9292 (@jplu)
- [trainer] parametrize default output_dir #9352 (@stas00)
- Fix utils on Windows #9368 (@jplu)
- Fix TF DPR #9283 (@jplu)
- [Docs] Tokenizer Squad 2.0 example #9378 (@patrickvonplaten)
- replace apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9386 (@stas00)
- [test_model_parallelization] multiple fixes #9354 (@stas00)
- Fix TF Longformer #9348 (@jplu)
- [logging] autoflush #9385 (@stas00)
- TF >= 2.3 cleaning #9369 (@jplu)
- [trainer] --model_parallel hasn't been implemented for most models #9347 (@stas00)
- Fix TF Funnel #9300 (@jplu)
- Fix documentation links always pointing to master. #9217 (@sugeeth14)
- [examples/text-classification] Fix a bug for using own regression dataset #9411 (@forest1988)
- [trainer] group fp16 args together #9409 (@stas00)
- [model parallel] add experimental warning #9412 (@stas00)
- improve readme text to private models/versioning/api #9424 (@clmnt)
- [PyTorch Bart] Split Bart into different models #9343 (@patrickvonplaten)
- [docs] outline sharded ddp doc #9208 (@stas00)
-...
v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.
v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.
TAPAS (@NielsRogge)
Four new models are released as part of the TAPAS implementation: TapasModel
, TapasForQuestionAnswering
, TapasForMaskedLM
and TapasForSequenceClassification
, in PyTorch.
TAPAS is a question answering model, used to answer queries given a table. It is a multi-modal model, joining text for the query and tabular data.
The TAPAS model was proposed in TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
- Tapas v4 (tres) #9117 (@NielsRogge)
- AutoModelForTableQuestionAnswering #9154 (@LysandreJik)
- TableQuestionAnsweringPipeline #9145 (@LysandreJik)
MPNet (@StillKeepTry)
Six new models are released as part of the MPNet implementation: MPNetModel
, MPNetForMaskedLM
, MPNetForSequenceClassification
, MPNetForMultipleChoice
, MPNetForTokenClassification
, MPNetForQuestionAnswering
, in both PyTorch and TensorFlow.
MPNet introduces a novel self-supervised objective named masked and permuted language modeling for language understanding. It inherits the advantages of both the masked language modeling (MLM) and the permuted language modeling (PLM) to addresses the limitations of MLM/PLM, and further reduce the inconsistency between the pre-training and fine-tuning paradigms.
The MPNet model was proposed in MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
- MPNet: Masked and Permuted Pre-training for Language Understanding #8971 (@StillKeepTry)
Model parallel (@alexorona)
Model parallelism is introduced, allowing users to load very large models on two or more GPUs by spreading the model layers over them. This can allow GPU training even for very large models.
- gpt2 and t5 parallel modeling #8696 (@alexorona)
- Model parallel documentation #8741 (@LysandreJik)
- Patch model parallel test #8825, #8920 (@LysandreJik)
Conda release (@LysandreJik)
Transformers welcome their first conda releases, with v4.0.0, v4.0.1 and v4.1.0. The conda packages are now officially maintained on the huggingface
channel.
- Put Transformers on Conda #8918 (@LysandreJik)
Multi-part uploads (@julien-c)
For the first time, very large models can be uploaded to the model hub, by using multi-part uploads.
New examples and reorganization (@sgugger)
We introduced a refactored SQuAD example & notebook, which is faster and simpler than the previous scripts.
The example directory has been re-ordered as we introduce the separation between "examples", which are maintained examples showcasing how to do one specific task, and "research projects", which are bigger projects and maintained by the community.
Introduction of fairscale with Sharded DDP (@sgugger)
We introduce support for fariscale's ShardedDDP in the Trainer
, allowing reduced memory usage when training models in a distributed fashion.
- Experimental support for fairscale ShardedDDP #9139 (@sgugger)
- Fix gradient clipping for Sharded DDP #9168 (@sgugger)
Barthez (@moussaKam)
The BARThez model is a French variant of the BART model. We welcome its specific tokenizer to the library and multiple checkpoints to the modelhub.
- Add barthez model #8393 (@moussaKam)
General improvements and bugfixes
disable_ngram_loss
fix for prophetnet #8554 (@Zhylkaaa)- Fix run_ner script #8664 (@sgugger)
- [tokenizers] convert_to_tensors: don't reconvert when the type is already right #8283 (@stas00)
- [examples/seq2seq] fix PL deprecation warning #8577 (@stas00)
- Add sentencepiece to the CI and fix tests #8672 (@sgugger)
- Alternative to globals() #8667 (@sgugger)
- Update the bibtex with EMNLP demo #8678 (@JetRunner)
- Document adam betas TrainingArguments #8688 (@sgugger)
- Fix rag finetuning + add finetuning test #8585 (@lhoestq)
- moved temperature warper before topP/topK warpers #8686 (@theorm)
- Vectorize RepetitionPenaltyLogitsProcessor to improve performance #8598 (@bdalal)
- [Generate Test] fix flaky ci #8694 (@patrickvonplaten)
- Fix bug in x-attentions output for roberta and harden test to catch it #8660 (@ysgit)
- Add pip install update to resolve import error in transformers notebook #8616 (@jessicayung)
- Improve bert-japanese tokenizer handling #8659 (@julien-c)
- Change default cache path #8734 (@sgugger)
- [trainer] make generate work with multigpu #8716 (@stas00)
- consistent ignore keys + make private #8737 (@stas00)
- Fix max length in run_plm script #8738 (@sgugger)
- Add early stopping callback to pytorch trainer #8581 (@cbrochtrup)
- Support various BERT relative position embeddings (2nd) #8276 (@zhiheng-huang)
- Fix slow tests v2 #8746 (@LysandreJik)
- MT5 should have an autotokenizer #8743 (@LysandreJik)
- added instructions for syncing upstream master with forked master via PR #8745 (@bdalal)
- fix rag index names in eval_rag.py example #8730 (@lhoestq)
- [core] implement support for run-time dependency version checking #8645 (@stas00)
- New TF model inputs #8602 (@jplu)
- Big model table #8774 (@sgugger)
- Attempt to get a better fix for QA #8768 (@Narsil)
- Fix QA argument handler #8765 (@LysandreJik)
- Return correct Bart hidden state tensors #8747 (@joeddav)
- [XLNet] Fix mems behavior #8567 (@patrickvonplaten)
- [s2s] finetune.py: specifying generation min_length #8478 (@danyaljj)
- Revert "[s2s] finetune.py: specifying generation min_length" #8805 (@patrickvonplaten)
- Fix PPLM #8779 (@chutaklee)
- [s2s finetune trainer] potpurri of small fixes #8807 (@stas00)
- [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes #8791 (@KristianHolsheimer)
- [Flax test] Add require pytorch to flix flax test #8816 (@patrickvonplaten)
- Fix dpr<>bart config for RAG #8808 (@patrickvonplaten)
- Extend typing to path-like objects in
PretrainedConfig
andPreTrainedModel
#8770 (@gcompagnoni) - Fix setup.py on Windows #8798 (@jplu)
- BART & FSMT: fix decoder not returning hidden states from the last layer #8597 (@maksym-del)
- suggest a numerical limit of 50MB for determining @slow #8824 (@stas00)
- [MT5] Add use_cache to config #8832 (@patrickvonplaten)
- [Pegasus] Refactor Tokenizer #8731 (@patrickvonplaten)
- [CI] implement job skipping for doc-only PRs #8826 (@stas00)
- Migration guide from v3.x to v4.x #8763 (@LysandreJik)
- Add T5 Encoder for Feature Extraction #8717 (@agemagician)
- token-classification: use is_world_process_zero instead of is_world_master() #8828 (@stefan-it)
- Correct docstring. #8845 (@Fraser-Greenlee)
- Add a direct link to the big table #8850 (@sgugger)
- Use model.from_pretrained for DataParallel also #8795 (@shaie)
- Remove deprecated
evalutate_during_training
#8852 (@sgugger) - Attempt to fix Flax CI error(s) #8829 (@mfuntowicz)
- NerPipeline (TokenClassification) now outputs offsets of words #8781 (@Narsil)
- [s2s trainer] fix DP mode #8823 (@stas00)
- Ctrl for sequence classification #8812 (@elk-cloner)
- Fix docstring for language code in mBart #8848 (@RQuispeC)
- 2 typos in modeling_rag.py #8676 (@ratthachat)
- Make the big table creation/check platform independent #8856 (@sgugger)
- Prevent BatchEncoding from blindly passing casts down to the tensors it contains #8860 (@Craigacp)
- Better warning when loading a tokenizer with AutoTokenizer w/o Sneten… #8881 (@LysandreJik)
- [CI] skip docs-only jobs take #2 #8853 (@stas00)
- Better support for resuming training #8878 (@sgugger)
- Add a
parallel_mode
property to TrainingArguments #8877 (@sgugger) - [trainer] start using training_args.parallel_mode #8882 (@stas00)
- [ci] skip doc jobs take #3 #8885 (@stas00)
- Transfoxl seq classification #8868 (@spatil6)
- Warning about too long input for fast tokenizers too #8799 (@Narsil)
- [trainer] improve code readability #8903 (@stas00)
- [PyTorch] Refactor Resize Token Embeddings #8880 (@patrickvonplaten)
- Don't warn that models aren't available if Flax is available. #8841 (@skye)
- Avoid erasing the attention mask when double padding #8915 (@sgugger)
- Fix move when the two cache folders exist #8917 (@sgugger)
- Tweak wording + Add badge w/ number of models on the hub #8914 (@julien-c)
- [s2s finetune_trainer] add instructions for distributed training #8884 (@stas00)
- Better booleans handling in the TF models #8777 (@jplu)
- Fix TF T5 only encoder model with booleans #8925 (@LysandreJik)
- [ci] skip doc jobs - circleCI is not reliable - disable skip for now #8926 (@stas00)
- [seq2seq] document the caveat of leaky native amp #8930 (@stas00)
- Don't pass in token_type_ids to BART for GLUE #8929 (@ethanjperez)
- Fix typo for
modeling_bert
import resulting in ImportError #8931 (@machelreid) - Fix QA pipeline on Windows #8947 (@sgugger)
- Add TFGPT2ForSequenceClassification based on DialogRPT #8714 (@spatil6)
- Remove sourcerer #8965 (@clmnt)
- Use word_ids to get labels in run_ner #8962 (@sgugger)
- Small fix to the run clm script #8973 (@sgugger)
- Update quicktour docs to showcase the use of truncation #8975 (@navjotts)
- Copyright #8970 (@sgugger)
- Check table as independent script #8976 (@LysandreJik)
- [training] SAVE_STATE_WARNING was removed in pytorch #8979 (@stas00)
- Optional layers #8961 (@jplu)
- Make
ModelOutput
pickle-able #8989 (@sgugger) - Fix interaction of return_token_type_ids and add_special_tokens #8854 (@LysandreJik)
- Removed unused
encoder_hidden_states
andencoder_attention_mask
#8972 (@guillaume-be) - Checking output format + check raises ValueError #8986 (@na...