22 Dec 19:34

05fa1a7

v4.15.0

New Model additions

WavLM

WavLM was proposed in WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.

WavLM sets a new SOTA on the SUPERB benchmark.

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=wavlm

Add WavLM by @patrickvonplaten in #14354

Wav2Vec2Phoneme

Wav2Vec2Phoneme was proposed in Simple and Effective Zero-shot Cross-lingual Phoneme Recognition by Qiantong Xu, Alexei Baevski, Michael Auli.
Wav2Vec2Phoneme allows to do phoneme classification as part of automatic speech recognition

[Wav2Vec2 Phoneme] Let phonemizer lang default to tokenizer's settings by @patrickvonplaten in #14829

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=phoneme-recognition

UniSpeech-SAT

Unispeech-SAT was proposed in UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.

UniSpeech-SAT is especially good at speaker related tasks.

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech-sat

UniSpeech

Unispeech was proposed in UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
Three new models are released as part of the ImageGPT integration: ImageGPTModel, ImageGPTForCausalImageModeling, ImageGPTForImageClassification, in PyTorch.

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech

New Tasks

Speaker Diarization and Verification

Wav2Vec2-like architecture now have a speaker diarization and speaker verification head added to their architectures.
You can try out the new task here: https://huggingface.co/spaces/microsoft/wavlm-speaker-verification

Add Speaker Diarization and Verification heads by @anton-l in #14723

What's Changed

Move import to avoid circular import by @sgugger in #14787
PoC for conserving old links by @sgugger in #14754
Removes images to put them in a dataset by @LysandreJik in #14781
Post sphinx-clean up and contributing guide updates by @sgugger in #14790
Fix the build documentation job by @sgugger in #14788
Update CONTRIBUTING.md by @kamalkraj in #14799
Update CONTRIBUTING.md by @kamalkraj in #14800
Train step fix by @Rocketknight1 in #14796
[Generate] Make generate multi-modal by @patrickvonplaten in #14784
Remove require_datasets testing utility by @LysandreJik in #14795
[WavLM] Correct position bias computation by @patrickvonplaten in #14805
Fix Perceiver multi GPU test by @NielsRogge in #14810
[WavLM] Layerdrop is not allowed for first layer by @patrickvonplaten in #14811
[Generate] Correct input_ids detection by @patrickvonplaten in #14815
Implement head_mask for Flax BERT and other models copied from BERT by @stancld in #14620
Convert rst to mdx bert by @LysandreJik in #14806
Wav2Vec2 meets phonemes by @patrickvonplaten in #14353
[ImageGPT] Deprecate pixel_values input name to input_ids by @patrickvonplaten in #14801
[Seq2SeqTrainer] Remove model input name hack by @patrickvonplaten in #14802
[WavLM] Fix slow tests by @patrickvonplaten in #14845
Add SD and SV heads for WavLM by @anton-l in #14847
Add an argument to set bucket_cap_mb for PyTorch DDP by @changlan in #14756
Update CONTRIBUTING.md by @kamalkraj in #14835
Fix dead link to benchmarks.ipynb by @DerekChia in #14842
[Perceiver] Skip multi-gpu tests for now by @patrickvonplaten in #14813
Add 'with torch.no_grad()' to DeBERTa integration test forward pass by @henholm in #14821
Add 'with torch.no_grad()' to BERT integration test forward pass by @henholm in #14820
Add a main_input_name attribute to all models by @sgugger in #14803
[doc] typo by @stas00 in #14849
[logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS by @stas00 in #14669
Make the onnx submodule init lazy by @sgugger in #14855
Convert docstrings of modeling files by @sgugger in #14850
[Bart] better error message by @patrickvonplaten in #14854
Only create the model card on process 0 by @sgugger in #14857
[ASR example] Improve example + add more examples by @patrickvonplaten in #14848
Fix the value error typo of AdamW's betas' valid values checking by @dourgey in #14780
Add custom stopping_criteria and logits_processor to generate by @lvwerra in #14779
Replace commit sha by commit url for update jobs by @sgugger in #14852
[examples/summarization] deal with None in data records by @stas00 in #14816
[doc porting] several docs by @stas00 in #14858
Mass conversion of documentation from rst to Markdown by @sgugger in #14866
Fix FLAX_MULTIPLE_CHOICE_SAMPLE typo by @mishig25 in #14871
Fixes in marian doc by @sgugger in #14872
Fix FlaxMarianMTModel return block. by @sgugger in #14873
Fix doc mistakes by @sgugger in #14874
Convert model files from rst to mdx by @LysandreJik in #14865
update the arguments add_prefix_space and trim_offsets in backend_tokenizer.post_processor of RobertaTokenizerFast by @SaulLu in #14752
Feature/fix slow test in mluke by @Ryou0634 in #14749
Updated deberta attention by @guillaume-be in #14625
IterableDatasetShard should use per device batch size instead of real… by @SysuCharon in #14714
Fix Perceiver code example by @NielsRogge in #14879
Fix pytorch image classification example by @mariosasko in #14883
Onnx enable tasks for supported models (part 2) by @michaelbenayoun in #14700
Properly indent return block by @sgugger in #14887

New Contributors

@changlan made their first contribution in #14756
@DerekChia made their first contribution in #14842
@henholm made their first contribution in #14821
@dourgey made their first contribution in #14780
@SysuCharon made their first contribution in #14714

Full Changelog: v4.14.0...v4.15.0

Contributors

changlan, DerekChia, and 19 other contributors

Assets 2

15 Dec 19:02

sgugger

v4.14.1

19e5ed7

v4.14.1: Patch release

v4.14.1 Patch release

Fixes a circular import when TensorFlow and Onnx are both installed (#14787)

Assets 2

15 Dec 17:27

LysandreJik

v4.14.0

960d8cb

v4.14.0: Perceiver, Keras model cards

Perceiver

The Perceiver model was released in the previous version:

Perceiver

Eight new models are released as part of the Perceiver implementation: PerceiverModel, PerceiverForMaskedLM, PerceiverForSequenceClassification, PerceiverForImageClassificationLearned, PerceiverForImageClassificationFourier, PerceiverForImageClassificationConvProcessing, PerceiverForOpticalFlow, PerceiverForMultimodalAutoencoding, in PyTorch.

The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch,
Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M.
Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.

Add Perceiver IO by @NielsRogge in #14487

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver

Version v4.14.0 adds support for Perceiver in multiple pipelines, including the fill mask and sequence classification pipelines.

Keras model cards

The Keras push to hub callback now generates model cards when pushing to the model hub. Additionally to the callback, model cards will be generated by default by the model.push_to_hub() method.

TF model cards by @Rocketknight1 in #14720

What's Changed

Fix : wrong link in the documentation (ConvBERT vs DistilBERT) by @Tikquuss in #14705
Put back open in colab markers by @sgugger in #14684
Fix doc examples: KeyError by @ydshieh in #14699
Fix doc examples: 'CausalLMOutput...' object has no attribute 'last_hidden_state' by @ydshieh in #14678
Adding Perceiver to AutoTokenizer. by @Narsil in #14711
Fix doc examples: unexpected keyword argument by @ydshieh in #14689
Automatically build doc notebooks by @sgugger in #14718
Fix special character in MDX by @sgugger in #14721
Fixing tests for perceiver (texts) by @Narsil in #14719
[doc] document MoE model approach and current solutions by @stas00 in #14725
[Flax examples] remove dependancy on pytorch training args by @patil-suraj in #14636
Update bug-report.md by @patrickvonplaten in #14715
[Adafactor] Fix adafactor by @patrickvonplaten in #14713
Code parrot minor fixes/niceties by @ncoop57 in #14666
Fix doc examples: modify config before super().init by @ydshieh in #14697
Improve documentation of some models by @NielsRogge in #14695
Skip Perceiver tests by @LysandreJik in #14745
Add ability to get a list of supported pipeline tasks by @codesue in #14732
Fix the perceiver docs by @LysandreJik in #14748
[CI/pt-nightly] switch to cuda-11.3 by @stas00 in #14726
Swap TF and PT code inside two blocks by @LucienShui in #14742
Fix doc examples: cannot import name by @ydshieh in #14698
Fix: change tooslow to slow by @ydshieh in #14734
Small fixes for the doc by @sgugger in #14751
Update transformers metadata by @sgugger in #14724
Mention no images added to repository by @LysandreJik in #14738
Avoid using tf.tile in embeddings for TF models by @ydshieh in #14735
Change how to load config of XLNetLMHeadModel by @josutk in #14746
Improve perceiver by @NielsRogge in #14750
Convert Trainer doc page to MarkDown by @sgugger in #14753
Update Table of Contents by @sgugger in #14755
Fixing tests for Perceiver by @Narsil in #14739
Make data shuffling in run_clm_flax.py respect global seed by @bminixhofer in #13410
Adding support for multiple mask tokens. by @Narsil in #14716
Fix broken links to distillation on index page of documentation by @amitness in #14722
[doc] performance: groups of operations by compute-intensity by @stas00 in #14757
Fix the doc_build_test job by @sgugger in #14774
Fix preprocess_function in run_summarization_flax.py by @ydshieh in #14769
Simplify T5 docs by @xhlulu in #14776
Update Perceiver code examples by @NielsRogge in #14783

New Contributors

@Tikquuss made their first contribution in #14705
@codesue made their first contribution in #14732
@LucienShui made their first contribution in #14742
@josutk made their first contribution in #14746
@amitness made their first contribution in #14722

Full Changelog: v4.13.0...v4.14.0

Contributors

Narsil, ydshieh, and 15 other contributors

Assets 2

09 Dec 16:07

LysandreJik

v4.13.0

4da3a69

v4.13.0: Perceiver, ImageGPT, mLUKE, Vision-Text dual encoders, QDQBert, new documentation frontend

New Model additions

Perceiver

Eight new models are released as part of the Perceiver implementation: PerceiverModel, PerceiverForMaskedLM, PerceiverForSequenceClassification, PerceiverForImageClassificationLearned, PerceiverForImageClassificationFourier, PerceiverForImageClassificationConvProcessing, PerceiverForOpticalFlow, PerceiverForMultimodalAutoencoding, in PyTorch.

The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch,
Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M.
Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.

Add Perceiver IO by @NielsRogge in #14487

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver

mLUKE

The mLUKE tokenizer is added. The tokenizer can be used for the multilingual variant of LUKE.

The mLUKE model was proposed in mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. It's a multilingual extension
of the LUKE model trained on the basis of XLM-RoBERTa.

Add mLUKE by @Ryou0634 in #14640

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=luke

ImageGPT

Three new models are released as part of the ImageGPT integration: ImageGPTModel, ImageGPTForCausalImageModeling, ImageGPTForImageClassification, in PyTorch.

The ImageGPT model was proposed in Generative Pretraining from Pixels by Mark
Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. ImageGPT (iGPT) is a GPT-2-like
model trained to predict the next pixel value, allowing for both unconditional and conditional image generation.

Add ImageGPT by @NielsRogge in #14240

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=imagegpt

QDQBert

Eight new models are released as part of the QDQBert implementation: QDQBertModel, QDQBertLMHeadModel, QDQBertForMaskedLM, QDQBertForSequenceClassification, QDQBertForNextSentencePrediction, QDQBertForMultipleChoice, QDQBertForTokenClassification, QDQBertForQuestionAnswering, in PyTorch.

The QDQBERT model can be referenced in Integer Quantization for Deep Learning Inference: Principles and Empirical
Evaluation by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius
Micikevicius.

Add QDQBert model and quantization examples of SQUAD task by @shangz-ai in #14066

Semantic Segmentation models

The semantic Segmentation models' API is unstable and bound to change between this version and the next.

The first semantic segmentation models are added. In semantic segmentation, the goal is to predict a class label for every pixel of an image. The models that are added are SegFormer (by NVIDIA) and BEiT (by Microsoft Research). BEiT was already available in the library, but this release includes the model with a semantic segmentation head.

The SegFormer model was proposed in SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. The model consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on image segmentation benchmarks such as ADE20K and Cityscapes.

The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s DALL-E model given masked patches.

Add SegFormer by @NielsRogge in #14019
Add BeitForSemanticSegmentation by @NielsRogge in #14096

Vision-text dual encoder

Adds VisionTextDualEncoder model in PyTorch and Flax to be able to load any pre-trained vision (ViT, DeiT, BeiT, CLIP's vision model) and text (BERT, ROBERTA) model in the library for vision-text tasks like CLIP.

This model pairs a vision and text encoder and adds projection layers to project the embeddings to another embeddings space with similar dimensions. which can then be used to align the two modalities.

VisionTextDualEncoder by @patil-suraj in #13511

CodeParrot

CodeParrot, a model trained to generate code, has been open-sourced in the research projects by @lvwerra.

Add CodeParrot 🦜 codebase by @lvwerra in #14536

Language model support for ASR

Add language model support for CTC models by @patrickvonplaten in #14339
Language model boosted decoding is added for all CTC models via https://github.com/kensho-technologies/pyctcdecode and https://github.com/kpu/kenlm.

See https://huggingface.co/patrickvonplaten/wav2vec2-xlsr-53-es-kenlm for more information.

Flax-specific additions

Adds Flax version of the vision encoder-decoder model, and adds a Flax version of GPT-J.

Add FlaxVisionEncoderDecoderModel by @ydshieh in #13359
FlaxGPTJ by @patil-suraj in #14396

TensorFlow-specific additions

Vision transformers are here! Convnets are so 2012, now that ML is converging on self-attention as a universal model.

Add TFViTModel by @ydshieh in #13778

Want to handle real-world tables, where text and data are positioned in a 2D grid? TAPAS is now here for both TensorFlow and PyTorch.

Tapas tf by @kamalkraj in #13393

Automatic checkpointing and cloud saves to the HuggingFace Hub during training are now live, allowing you to resume training when it's interrupted, even if your initial instance is terminated. This is an area of very active development - watch this space for future developments, including automatic model card creation and more.

Add model checkpointing to push_to_hub and PushToHubCallback by @Rocketknight1 in #14492

Auto-processors

A new class to automatically select processors is added: AutoProcessor. It can be used for all models that require a processor, in both computer vision and audio.

Auto processor by @sgugger in #14465

New documentation frontend

A new documentation frontend is out for the transformers library! The goal with this documentation is to be better aligned with the rest of our website, and contains tools to improve readability. The documentation can now be written in markdown rather than RST.

Doc new front by @sgugger in #14590

LayoutLM Improvements

The LayoutLMv2 feature extractor now supports non-English languages, and LayoutXLM gets its own processor.

LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. by @Xargonus in #14514
Add LayoutXLMProcessor (and LayoutXLMTokenizer, LayoutXLMTokenizerFast) by @NielsRogge in #14115

Trainer Improvements

You can now take advantage of the Ampere hardware with the Trainer:

--bf16 - do training or eval in mixed precision of bfloat16
--bf16_full_eval - do eval in full bfloat16
--tf32 control having TF32 mode on/off

Improvements and bugfixes

Replace assertions with RuntimeError exceptions by @ddrm86 in #14186
Adding batch_size support for (almost) all pipelines by @Narsil in #13724
Remove n_ctx from configs by @thomasw21 in #14165
Add BlenderbotTokenizerFast by @stancld in #13720
Adding handle_long_generation paramters for text-generation pipeline. by @Narsil in #14118
Fix pipeline tests env and fetch by @sgugger in #14209
Generalize problem_type to all sequence classification models by @sgugger in #14180
Fixing image segmentation with inference mode. by @Narsil in #14204
Add a condition for checking labels by @hrxorxm in #14211
Torch 1.10 by @LysandreJik in #14169
Add more missing models to models/init.py by @ydshieh in #14177
Clarify QA examples by @NielsRogge in #14172
Fixing image-segmentation tests. by @Narsil in #14223
Tenso...

Contributors

Narsil, Zahlii, and 63 other contributors

Assets 2

17 Nov 16:38

LysandreJik

v4.12.5

ef3cec0

v4.12.5: Patch release

Reverts a commit that introduced other issues:

Revert "Experimenting with adding proper get_config() and from_config() methods (#14361)"

Assets 2

16 Nov 22:31

LysandreJik

v4.12.4

527c763

v4.12.4: Patch release

Fix gradient_checkpointing backward compatibility (#14408)
[Wav2Vec2] Make sure that gradient checkpointing is only run if needed (#14407)
Experimenting with adding proper get_config() and from_config() methods (#14361)
enhance rewrite state_dict missing _metadata (#14348)
Support for TF >= 2.7 (#14345)
improve rewrite state_dict missing _metadata (#14276)
Fix of issue #13327: Wrong weight initialization for TF t5 model (#14241)

Assets 2

03 Nov 13:05

sgugger

v4.12.3

3ea15d2

v4.12.3: Patch release

Add PushToHubCallback in main init (#14246)
Supports huggingface_hub >= 0.1.0

Assets 2

29 Oct 18:52

LysandreJik

v4.12.2

2191373

v4.12.2: Patch release

Fixes an issue with the image segmentation pipeline and PyTorch's inference mode.

Assets 2

29 Oct 18:43

LysandreJik

v4.12.1

e0a5154

v4.12.1: Patch release

Enables torch 1.10.0

Assets 2

28 Oct 16:57

LysandreJik

v4.12.0

62bf536

v4.12.0: TrOCR, SEW & SEW-D, Unispeech & Unispeech-SAT, BARTPho

TrOCR and VisionEncoderDecoderModel

One new model is released as part of the TrOCR implementation: TrOCRForCausalLM, in PyTorch. It comes along a new VisionEncoderDecoderModel class, which allows to mix-and-match any vision Transformer encoder with any text Transformer as decoder, similar to the existing SpeechEncoderDecoderModel class.

The TrOCR model was proposed in TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models, by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.

The TrOCR model consists of an image transformer encoder and an autoregressive text transformer to perform optical character recognition in an end-to-end manner.

Add TrOCR + VisionEncoderDecoderModel by @NielsRogge in #13874

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=trocr

SEW & SEW-D

SEW and SEW-D (Squeezed and Efficient Wav2Vec) were proposed in Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.

SEW and SEW-D models use a Wav2Vec-style feature encoder and introduce temporal downsampling to reduce the length of the transformer encoder. SEW-D additionally replaces the transformer encoder with a DeBERTa one. Both models achieve significant inference speedups without sacrificing the speech recognition quality.

Add the SEW and SEW-D speech models by @anton-l in #13962
Add SEW CTC models by @anton-l in #14158

Compatible checkpoints are available on the Hub: https://huggingface.co/models?other=sew and https://huggingface.co/models?other=sew-d

DistilHuBERT

DistilHuBERT was proposed in DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT, by Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee.

DistilHuBERT is a distilled version of the HuBERT model. Using only two transformer layers, the model scores competitively on the SUPERB benchmark tasks.

Compatible checkpoint is available on the Hub: https://huggingface.co/ntu-spml/distilhubert

TensorFlow improvements

Several bug fixes and UX improvements for TensorFlow

Keras callback

Introduction of a Keras callback to push to the hub each epoch, or after a given number of steps:

Keras callback to push to hub each epoch, or after N steps by @Rocketknight1 in #13773

Updates on the encoder-decoder framework

The encoder-decoder framework is now available in TensorFlow, allowing mixing and matching different encoders and decoders together into a single encoder-decoder architecture!

Add TFEncoderDecoderModel + Add cross-attention to some TF models by @ydshieh in #13222

Besides this, the EncoderDecoderModel classes have been updated to work similar to models like BART and T5. From now on, users don't need to pass decoder_input_ids themselves anymore to the model. Instead, they will be created automatically based on the labels (namely by shifting them one position to the right, replacing -100 by the pad_token_id and prepending the decoder_start_token_id). Note that this may result in training discrepancies if fine-tuning a model trained with versions anterior to 4.12.0 that set the decoder_input_ids = labels.

Fix EncoderDecoderModel classes to be more like BART and T5 by @NielsRogge in #14139

Speech improvements

Add DistilHuBERT by @anton-l in #14174
[Speech Examples] Add pytorch speech pretraining by @patrickvonplaten in #13877
[Speech Examples] Add new audio feature by @patrickvonplaten in #14027
Add ASR colabs by @patrickvonplaten in #14067
[ASR] Make speech recognition example more general to load any tokenizer by @patrickvonplaten in #14079
[Examples] Add an official audio classification example by @anton-l in #13722
[Examples] Use Audio feature in speech classification by @anton-l in #14052

Auto-model API

To make it easier to extend the Transformers library, every Auto class a new register method, that allows you to register your own custom models, configurations or tokenizers. See more in the documentation

Add an API to register objects to Auto classes by @sgugger in #13989

Bug fixes and improvements

Fix filtering in test fetcher utils by @sgugger in #13766
Fix warning for gradient_checkpointing by @sgugger in #13767
Implement len in IterableDatasetShard by @sgugger in #13780
[Wav2Vec2] Better error message by @patrickvonplaten in #13777
Fix LayoutLM ONNX test error by @nishprabhu in #13710
Enable readme link synchronization by @qqaatw in #13785
Fix length of IterableDatasetShard and add test by @sgugger in #13792
[docs/gpt-j] addd instructions for how minimize CPU RAM usage by @patil-suraj in #13795
[examples run_glue.py] missing requirements scipy, sklearn by @stas00 in #13768
[examples/flax] use Repository API for push_to_hub by @patil-suraj in #13672
Fix gather for TPU by @sgugger in #13813
[testing] auto-replay captured streams by @stas00 in #13803
Add MultiBERTs conversion script by @gchhablani in #13077
[Examples] Improve mapping in accelerate examples by @patrickvonplaten in #13810
[DPR] Correct init by @patrickvonplaten in #13796
skip gptj slow generate tests by @patil-suraj in #13809
Fix warning situation: UserWarning: max_length is ignored when padding=True" by @shirayu in #13829
Updating CITATION.cff to fix GitHub citation prompt BibTeX output. by @arfon in #13833
Add TF notebooks by @Rocketknight1 in #13793
Bart: check if decoder_inputs_embeds is set by @silviu-oprea in #13800
include megatron_gpt2 in installed modules by @stas00 in #13834
Delete MultiBERTs conversion script by @gchhablani in #13852
Remove a duplicated bullet point in the GPT-J doc by @yaserabdelaziz in #13851
Add Mistral GPT-2 Stability Tweaks by @siddk in #13573
Fix broken link to distill models in docs by @Randl in #13848
✨ update image classification example by @nateraw in #13824
Update no_* argument (HfArgumentParser) by @BramVanroy in #13865
Update Tatoeba conversion by @Traubert in #13757
Fixing 1-length special tokens cut. by @Narsil in #13862
Fix flax summarization example: save checkpoint after each epoch and push checkpoint to the hub by @ydshieh in #13872
Fixing empty prompts for text-generation when BOS exists. by @Narsil in #13859
Improve error message when loading models from Hub by @aphedges in #13836
Initial support for symbolic tracing with torch.fx allowing dynamic axes by @michaelbenayoun in #13579
Allow dataset to be an optional argument for (Distributed)LengthGroupedSampler by @ZhaofengWu in #13820
Fixing question-answering with long contexts by @Narsil in #13873
fix(integrations): consider test metrics by @borisdayma in #13888
fix: replace asserts by value error by @m5l14i11 in #13894
Update parallelism.md by @hyunwoongko in #13892
Autodocument the list of ONNX-supported models by @sgugger in #13884
Fixing GPU for token-classification in a better way. by @Narsil in #13856
Update FSNER code in examples->...