Releases: huggingface/transformers
v4.15.0
New Model additions
WavLM
WavLM was proposed in WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
WavLM sets a new SOTA on the SUPERB benchmark.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=wavlm
- Add WavLM by @patrickvonplaten in #14354
Wav2Vec2Phoneme
Wav2Vec2Phoneme was proposed in Simple and Effective Zero-shot Cross-lingual Phoneme Recognition by Qiantong Xu, Alexei Baevski, Michael Auli.
Wav2Vec2Phoneme allows to do phoneme classification as part of automatic speech recognition
- [Wav2Vec2 Phoneme] Let phonemizer lang default to tokenizer's settings by @patrickvonplaten in #14829
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=phoneme-recognition
UniSpeech-SAT
Unispeech-SAT was proposed in UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
UniSpeech-SAT is especially good at speaker related tasks.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech-sat
UniSpeech
Unispeech was proposed in UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
Three new models are released as part of the ImageGPT integration: ImageGPTModel
, ImageGPTForCausalImageModeling
, ImageGPTForImageClassification
, in PyTorch.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech
New Tasks
Speaker Diarization and Verification
Wav2Vec2-like architecture now have a speaker diarization and speaker verification head added to their architectures.
You can try out the new task here: https://huggingface.co/spaces/microsoft/wavlm-speaker-verification
What's Changed
- Move import to avoid circular import by @sgugger in #14787
- PoC for conserving old links by @sgugger in #14754
- Removes images to put them in a dataset by @LysandreJik in #14781
- Post sphinx-clean up and contributing guide updates by @sgugger in #14790
- Fix the build documentation job by @sgugger in #14788
- Update CONTRIBUTING.md by @kamalkraj in #14799
- Update CONTRIBUTING.md by @kamalkraj in #14800
- Train step fix by @Rocketknight1 in #14796
- [Generate] Make generate multi-modal by @patrickvonplaten in #14784
- Remove
require_datasets
testing utility by @LysandreJik in #14795 - [WavLM] Correct position bias computation by @patrickvonplaten in #14805
- Fix Perceiver multi GPU test by @NielsRogge in #14810
- [WavLM] Layerdrop is not allowed for first layer by @patrickvonplaten in #14811
- [Generate] Correct input_ids detection by @patrickvonplaten in #14815
- Implement head_mask for Flax BERT and other models copied from BERT by @stancld in #14620
- Convert rst to mdx bert by @LysandreJik in #14806
- Wav2Vec2 meets phonemes by @patrickvonplaten in #14353
- [ImageGPT] Deprecate pixel_values input name to input_ids by @patrickvonplaten in #14801
- [Seq2SeqTrainer] Remove model input name hack by @patrickvonplaten in #14802
- [WavLM] Fix slow tests by @patrickvonplaten in #14845
- Add SD and SV heads for WavLM by @anton-l in #14847
- Add an argument to set bucket_cap_mb for PyTorch DDP by @changlan in #14756
- Update CONTRIBUTING.md by @kamalkraj in #14835
- Fix dead link to benchmarks.ipynb by @DerekChia in #14842
- [Perceiver] Skip multi-gpu tests for now by @patrickvonplaten in #14813
- Add 'with torch.no_grad()' to DeBERTa integration test forward pass by @henholm in #14821
- Add 'with torch.no_grad()' to BERT integration test forward pass by @henholm in #14820
- Add a main_input_name attribute to all models by @sgugger in #14803
- [doc] typo by @stas00 in #14849
- [logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS by @stas00 in #14669
- Make the onnx submodule init lazy by @sgugger in #14855
- Convert docstrings of modeling files by @sgugger in #14850
- [Bart] better error message by @patrickvonplaten in #14854
- Only create the model card on process 0 by @sgugger in #14857
- [ASR example] Improve example + add more examples by @patrickvonplaten in #14848
- Fix the value error typo of AdamW's betas' valid values checking by @dourgey in #14780
- Add custom
stopping_criteria
andlogits_processor
togenerate
by @lvwerra in #14779 - Replace commit sha by commit url for update jobs by @sgugger in #14852
- [examples/summarization] deal with None in data records by @stas00 in #14816
- [doc porting] several docs by @stas00 in #14858
- Mass conversion of documentation from rst to Markdown by @sgugger in #14866
- Fix FLAX_MULTIPLE_CHOICE_SAMPLE typo by @mishig25 in #14871
- Fixes in marian doc by @sgugger in #14872
- Fix
FlaxMarianMTModel
return block. by @sgugger in #14873 - Fix doc mistakes by @sgugger in #14874
- Convert model files from rst to mdx by @LysandreJik in #14865
- update the arguments
add_prefix_space
andtrim_offsets
inbackend_tokenizer.post_processor
ofRobertaTokenizerFast
by @SaulLu in #14752 - Feature/fix slow test in mluke by @Ryou0634 in #14749
- Updated deberta attention by @guillaume-be in #14625
- IterableDatasetShard should use per device batch size instead of real… by @SysuCharon in #14714
- Fix Perceiver code example by @NielsRogge in #14879
- Fix pytorch image classification example by @mariosasko in #14883
- Onnx enable tasks for supported models (part 2) by @michaelbenayoun in #14700
- Properly indent return block by @sgugger in #14887
New Contributors
- @changlan made their first contribution in #14756
- @DerekChia made their first contribution in #14842
- @henholm made their first contribution in #14821
- @dourgey made their first contribution in #14780
- @SysuCharon made their first contribution in #14714
Full Changelog: v4.14.0...v4.15.0
v4.14.1: Patch release
v4.14.1 Patch release
Fixes a circular import when TensorFlow and Onnx are both installed (#14787)
v4.14.0: Perceiver, Keras model cards
Perceiver
The Perceiver model was released in the previous version:
Perceiver
Eight new models are released as part of the Perceiver implementation:
PerceiverModel
,PerceiverForMaskedLM
,PerceiverForSequenceClassification
,PerceiverForImageClassificationLearned
,PerceiverForImageClassificationFourier
,PerceiverForImageClassificationConvProcessing
,PerceiverForOpticalFlow
,PerceiverForMultimodalAutoencoding
, in PyTorch.The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch,
Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M.
Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
- Add Perceiver IO by @NielsRogge in #14487
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver
Version v4.14.0 adds support for Perceiver in multiple pipelines, including the fill mask and sequence classification pipelines.
Keras model cards
The Keras push to hub callback now generates model cards when pushing to the model hub. Additionally to the callback, model cards will be generated by default by the model.push_to_hub() method.
- TF model cards by @Rocketknight1 in #14720
What's Changed
-
Fix : wrong link in the documentation (ConvBERT vs DistilBERT) by @Tikquuss in #14705
-
Fix doc examples: 'CausalLMOutput...' object has no attribute 'last_hidden_state' by @ydshieh in #14678
-
Fix doc examples: unexpected keyword argument by @ydshieh in #14689
-
[doc] document MoE model approach and current solutions by @stas00 in #14725
-
[Flax examples] remove dependancy on pytorch training args by @patil-suraj in #14636
-
Update bug-report.md by @patrickvonplaten in #14715
-
[Adafactor] Fix adafactor by @patrickvonplaten in #14713
-
Fix doc examples: modify config before super().init by @ydshieh in #14697
-
Improve documentation of some models by @NielsRogge in #14695
-
Skip Perceiver tests by @LysandreJik in #14745
-
Add ability to get a list of supported pipeline tasks by @codesue in #14732
-
Fix the perceiver docs by @LysandreJik in #14748
-
Swap TF and PT code inside two blocks by @LucienShui in #14742
-
Mention no images added to repository by @LysandreJik in #14738
-
Avoid using tf.tile in embeddings for TF models by @ydshieh in #14735
-
Change how to load config of XLNetLMHeadModel by @josutk in #14746
-
Improve perceiver by @NielsRogge in #14750
-
Make data shuffling in
run_clm_flax.py
respect global seed by @bminixhofer in #13410 -
Adding support for multiple mask tokens. by @Narsil in #14716
-
Fix broken links to distillation on index page of documentation by @amitness in #14722
-
[doc] performance: groups of operations by compute-intensity by @stas00 in #14757
-
Fix preprocess_function in run_summarization_flax.py by @ydshieh in #14769
-
Update Perceiver code examples by @NielsRogge in #14783
New Contributors
- @Tikquuss made their first contribution in #14705
- @codesue made their first contribution in #14732
- @LucienShui made their first contribution in #14742
- @josutk made their first contribution in #14746
- @amitness made their first contribution in #14722
Full Changelog: v4.13.0...v4.14.0
v4.13.0: Perceiver, ImageGPT, mLUKE, Vision-Text dual encoders, QDQBert, new documentation frontend
New Model additions
Perceiver
Eight new models are released as part of the Perceiver implementation: PerceiverModel
, PerceiverForMaskedLM
, PerceiverForSequenceClassification
, PerceiverForImageClassificationLearned
, PerceiverForImageClassificationFourier
, PerceiverForImageClassificationConvProcessing
, PerceiverForOpticalFlow
, PerceiverForMultimodalAutoencoding
, in PyTorch.
The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch,
Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M.
Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
- Add Perceiver IO by @NielsRogge in #14487
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver
mLUKE
The mLUKE tokenizer is added. The tokenizer can be used for the multilingual variant of LUKE.
The mLUKE model was proposed in mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. It's a multilingual extension
of the LUKE model trained on the basis of XLM-RoBERTa.
- Add mLUKE by @Ryou0634 in #14640
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=luke
ImageGPT
Three new models are released as part of the ImageGPT integration: ImageGPTModel
, ImageGPTForCausalImageModeling
, ImageGPTForImageClassification
, in PyTorch.
The ImageGPT model was proposed in Generative Pretraining from Pixels by Mark
Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. ImageGPT (iGPT) is a GPT-2-like
model trained to predict the next pixel value, allowing for both unconditional and conditional image generation.
- Add ImageGPT by @NielsRogge in #14240
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=imagegpt
QDQBert
Eight new models are released as part of the QDQBert implementation: QDQBertModel
, QDQBertLMHeadModel
, QDQBertForMaskedLM
, QDQBertForSequenceClassification
, QDQBertForNextSentencePrediction
, QDQBertForMultipleChoice
, QDQBertForTokenClassification
, QDQBertForQuestionAnswering
, in PyTorch.
The QDQBERT model can be referenced in Integer Quantization for Deep Learning Inference: Principles and Empirical
Evaluation by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius
Micikevicius.
- Add QDQBert model and quantization examples of SQUAD task by @shangz-ai in #14066
Semantic Segmentation models
The semantic Segmentation models' API is unstable and bound to change between this version and the next.
The first semantic segmentation models are added. In semantic segmentation, the goal is to predict a class label for every pixel of an image. The models that are added are SegFormer (by NVIDIA) and BEiT (by Microsoft Research). BEiT was already available in the library, but this release includes the model with a semantic segmentation head.
The SegFormer model was proposed in SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. The model consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on image segmentation benchmarks such as ADE20K and Cityscapes.
The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s DALL-E model given masked patches.
- Add SegFormer by @NielsRogge in #14019
- Add BeitForSemanticSegmentation by @NielsRogge in #14096
Vision-text dual encoder
Adds VisionTextDualEncoder model in PyTorch and Flax to be able to load any pre-trained vision (ViT, DeiT, BeiT, CLIP's vision model) and text (BERT, ROBERTA) model in the library for vision-text tasks like CLIP.
This model pairs a vision and text encoder and adds projection layers to project the embeddings to another embeddings space with similar dimensions. which can then be used to align the two modalities.
- VisionTextDualEncoder by @patil-suraj in #13511
CodeParrot
CodeParrot, a model trained to generate code, has been open-sourced in the research projects by @lvwerra.
Language model support for ASR
- Add language model support for CTC models by @patrickvonplaten in #14339
Language model boosted decoding is added for all CTC models via https://github.com/kensho-technologies/pyctcdecode and https://github.com/kpu/kenlm.
See https://huggingface.co/patrickvonplaten/wav2vec2-xlsr-53-es-kenlm for more information.
Flax-specific additions
Adds Flax version of the vision encoder-decoder model, and adds a Flax version of GPT-J.
- Add FlaxVisionEncoderDecoderModel by @ydshieh in #13359
- FlaxGPTJ by @patil-suraj in #14396
TensorFlow-specific additions
Vision transformers are here! Convnets are so 2012, now that ML is converging on self-attention as a universal model.
Want to handle real-world tables, where text and data are positioned in a 2D grid? TAPAS is now here for both TensorFlow and PyTorch.
- Tapas tf by @kamalkraj in #13393
Automatic checkpointing and cloud saves to the HuggingFace Hub during training are now live, allowing you to resume training when it's interrupted, even if your initial instance is terminated. This is an area of very active development - watch this space for future developments, including automatic model card creation and more.
- Add model checkpointing to push_to_hub and PushToHubCallback by @Rocketknight1 in #14492
Auto-processors
A new class to automatically select processors is added: AutoProcessor
. It can be used for all models that require a processor, in both computer vision and audio.
New documentation frontend
A new documentation frontend is out for the transformers
library! The goal with this documentation is to be better aligned with the rest of our website, and contains tools to improve readability. The documentation can now be written in markdown rather than RST.
LayoutLM Improvements
The LayoutLMv2 feature extractor now supports non-English languages, and LayoutXLM gets its own processor.
- LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. by @Xargonus in #14514
- Add LayoutXLMProcessor (and LayoutXLMTokenizer, LayoutXLMTokenizerFast) by @NielsRogge in #14115
Trainer Improvements
You can now take advantage of the Ampere hardware with the Trainer:
--bf16
- do training or eval in mixed precision of bfloat16--bf16_full_eval
- do eval in full bfloat16--tf32
control having TF32 mode on/off
Improvements and bugfixes
- Replace assertions with RuntimeError exceptions by @ddrm86 in #14186
- Adding
batch_size
support for (almost) all pipelines by @Narsil in #13724 - Remove n_ctx from configs by @thomasw21 in #14165
- Add
BlenderbotTokenizerFast
by @stancld in #13720 - Adding
handle_long_generation
paramters fortext-generation
pipeline. by @Narsil in #14118 - Fix pipeline tests env and fetch by @sgugger in #14209
- Generalize problem_type to all sequence classification models by @sgugger in #14180
- Fixing image segmentation with inference mode. by @Narsil in #14204
- Add a condition for checking labels by @hrxorxm in #14211
- Torch 1.10 by @LysandreJik in #14169
- Add more missing models to models/init.py by @ydshieh in #14177
- Clarify QA examples by @NielsRogge in #14172
- Fixing
image-segmentation
tests. by @Narsil in #14223 - Tenso...
v4.12.5: Patch release
Reverts a commit that introduced other issues:
- Revert "Experimenting with adding proper get_config() and from_config() methods (#14361)"
v4.12.4: Patch release
- Fix gradient_checkpointing backward compatibility (#14408)
- [Wav2Vec2] Make sure that gradient checkpointing is only run if needed (#14407)
- Experimenting with adding proper get_config() and from_config() methods (#14361)
- enhance rewrite state_dict missing _metadata (#14348)
- Support for TF >= 2.7 (#14345)
- improve rewrite state_dict missing _metadata (#14276)
- Fix of issue #13327: Wrong weight initialization for TF t5 model (#14241)
v4.12.3: Patch release
v4.12.3: Patch release
- Add PushToHubCallback in main init (#14246)
- Supports huggingface_hub >= 0.1.0
v4.12.2: Patch release
Fixes an issue with the image segmentation pipeline and PyTorch's inference mode.
v4.12.1: Patch release
Enables torch 1.10.0
v4.12.0: TrOCR, SEW & SEW-D, Unispeech & Unispeech-SAT, BARTPho
TrOCR and VisionEncoderDecoderModel
One new model is released as part of the TrOCR implementation: TrOCRForCausalLM
, in PyTorch. It comes along a new VisionEncoderDecoderModel
class, which allows to mix-and-match any vision Transformer encoder with any text Transformer as decoder, similar to the existing SpeechEncoderDecoderModel
class.
The TrOCR model was proposed in TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models, by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
The TrOCR model consists of an image transformer encoder and an autoregressive text transformer to perform optical character recognition in an end-to-end manner.
- Add TrOCR + VisionEncoderDecoderModel by @NielsRogge in #13874
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=trocr
SEW & SEW-D
SEW and SEW-D (Squeezed and Efficient Wav2Vec) were proposed in Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
SEW and SEW-D models use a Wav2Vec-style feature encoder and introduce temporal downsampling to reduce the length of the transformer encoder. SEW-D additionally replaces the transformer encoder with a DeBERTa one. Both models achieve significant inference speedups without sacrificing the speech recognition quality.
Compatible checkpoints are available on the Hub: https://huggingface.co/models?other=sew and https://huggingface.co/models?other=sew-d
DistilHuBERT
DistilHuBERT was proposed in DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT, by Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee.
DistilHuBERT is a distilled version of the HuBERT model. Using only two transformer layers, the model scores competitively on the SUPERB benchmark tasks.
Compatible checkpoint is available on the Hub: https://huggingface.co/ntu-spml/distilhubert
TensorFlow improvements
Several bug fixes and UX improvements for TensorFlow
Keras callback
Introduction of a Keras callback to push to the hub each epoch, or after a given number of steps:
- Keras callback to push to hub each epoch, or after N steps by @Rocketknight1 in #13773
Updates on the encoder-decoder framework
The encoder-decoder framework is now available in TensorFlow, allowing mixing and matching different encoders and decoders together into a single encoder-decoder architecture!
Besides this, the EncoderDecoderModel
classes have been updated to work similar to models like BART and T5. From now on, users don't need to pass decoder_input_ids
themselves anymore to the model. Instead, they will be created automatically based on the labels
(namely by shifting them one position to the right, replacing -100 by the pad_token_id
and prepending the decoder_start_token_id
). Note that this may result in training discrepancies if fine-tuning a model trained with versions anterior to 4.12.0 that set the decoder_input_ids
= labels
.
- Fix EncoderDecoderModel classes to be more like BART and T5 by @NielsRogge in #14139
Speech improvements
- Add DistilHuBERT by @anton-l in #14174
- [Speech Examples] Add pytorch speech pretraining by @patrickvonplaten in #13877
- [Speech Examples] Add new audio feature by @patrickvonplaten in #14027
- Add ASR colabs by @patrickvonplaten in #14067
- [ASR] Make speech recognition example more general to load any tokenizer by @patrickvonplaten in #14079
- [Examples] Add an official audio classification example by @anton-l in #13722
- [Examples] Use Audio feature in speech classification by @anton-l in #14052
Auto-model API
To make it easier to extend the Transformers library, every Auto class a new register
method, that allows you to register your own custom models, configurations or tokenizers. See more in the documentation
Bug fixes and improvements
- Fix filtering in test fetcher utils by @sgugger in #13766
- Fix warning for gradient_checkpointing by @sgugger in #13767
- Implement len in IterableDatasetShard by @sgugger in #13780
- [Wav2Vec2] Better error message by @patrickvonplaten in #13777
- Fix LayoutLM ONNX test error by @nishprabhu in #13710
- Enable readme link synchronization by @qqaatw in #13785
- Fix length of IterableDatasetShard and add test by @sgugger in #13792
- [docs/gpt-j] addd instructions for how minimize CPU RAM usage by @patil-suraj in #13795
- [examples
run_glue.py
] missing requirementsscipy
,sklearn
by @stas00 in #13768 - [examples/flax] use Repository API for push_to_hub by @patil-suraj in #13672
- Fix gather for TPU by @sgugger in #13813
- [testing] auto-replay captured streams by @stas00 in #13803
- Add MultiBERTs conversion script by @gchhablani in #13077
- [Examples] Improve mapping in accelerate examples by @patrickvonplaten in #13810
- [DPR] Correct init by @patrickvonplaten in #13796
- skip gptj slow generate tests by @patil-suraj in #13809
- Fix warning situation: UserWarning: max_length is ignored when padding=True" by @shirayu in #13829
- Updating CITATION.cff to fix GitHub citation prompt BibTeX output. by @arfon in #13833
- Add TF notebooks by @Rocketknight1 in #13793
- Bart: check if decoder_inputs_embeds is set by @silviu-oprea in #13800
- include megatron_gpt2 in installed modules by @stas00 in #13834
- Delete MultiBERTs conversion script by @gchhablani in #13852
- Remove a duplicated bullet point in the GPT-J doc by @yaserabdelaziz in #13851
- Add Mistral GPT-2 Stability Tweaks by @siddk in #13573
- Fix broken link to distill models in docs by @Randl in #13848
- ✨ update image classification example by @nateraw in #13824
- Update no_* argument (HfArgumentParser) by @BramVanroy in #13865
- Update Tatoeba conversion by @Traubert in #13757
- Fixing 1-length special tokens cut. by @Narsil in #13862
- Fix flax summarization example: save checkpoint after each epoch and push checkpoint to the hub by @ydshieh in #13872
- Fixing empty prompts for text-generation when BOS exists. by @Narsil in #13859
- Improve error message when loading models from Hub by @aphedges in #13836
- Initial support for symbolic tracing with torch.fx allowing dynamic axes by @michaelbenayoun in #13579
- Allow dataset to be an optional argument for (Distributed)LengthGroupedSampler by @ZhaofengWu in #13820
- Fixing question-answering with long contexts by @Narsil in #13873
- fix(integrations): consider test metrics by @borisdayma in #13888
- fix: replace asserts by value error by @m5l14i11 in #13894
- Update parallelism.md by @hyunwoongko in #13892
- Autodocument the list of ONNX-supported models by @sgugger in #13884
- Fixing GPU for token-classification in a better way. by @Narsil in #13856
- Update FSNER code in examples->...