@@ -527,7 +527,7 @@ Pegasus
527
527
<https://arxiv.org/pdf/1912.08777.pdf> `_, Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
528
528
529
529
Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on
530
- two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pre-training
530
+ two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pretraining
531
531
objective, called Gap Sentence Generation (GSG).
532
532
533
533
* MLM: encoder input tokens are randomly replaced by a mask tokens and have to be predicted by the encoder (like in
609
609
`mT5: A massively multilingual pre-trained text-to-text transformer <https://arxiv.org/abs/2010.11934 >`_, Linting Xue
610
610
et al.
611
611
612
- The model architecture is same as T5. mT5's pre-training objective includes T5's self-supervised training, but not T5's
612
+ The model architecture is same as T5. mT5's pretraining objective includes T5's self-supervised training, but not T5's
613
613
supervised training. mT5 is trained on 101 languages.
614
614
615
615
The library provides a version of this model for conditional generation.
@@ -630,8 +630,8 @@ MBart
630
630
`Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210 >`_ by Yinhan Liu,
631
631
Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
632
632
633
- The model architecture and pre-training objective is same as BART, but MBart is trained on 25 languages and is intended
634
- for supervised and unsupervised machine translation. MBart is one of the first methods for pre-training a complete
633
+ The model architecture and pretraining objective is same as BART, but MBart is trained on 25 languages and is intended
634
+ for supervised and unsupervised machine translation. MBart is one of the first methods for pretraining a complete
635
635
sequence-to-sequence model by denoising full texts in multiple languages,
636
636
637
637
The library provides a version of this model for conditional generation.
@@ -658,7 +658,7 @@ ProphetNet
658
658
`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063 >`__ by
659
659
Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
660
660
661
- ProphetNet introduces a novel *sequence-to-sequence * pre-training objective, called *future n-gram prediction *. In
661
+ ProphetNet introduces a novel *sequence-to-sequence * pretraining objective, called *future n-gram prediction *. In
662
662
future n-gram prediction, the model predicts the next n tokens simultaneously based on previous context tokens at each
663
663
time step instead instead of just the single next token. The future n-gram prediction explicitly encourages the model
664
664
to plan for the future tokens and prevent overfitting on strong local correlations. The model architecture is based on
@@ -683,8 +683,8 @@ XLM-ProphetNet
683
683
`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063 >`__ by
684
684
Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
685
685
686
- XLM-ProphetNet's model architecture and pre-training objective is same as ProphetNet, but XLM-ProphetNet was
687
- pre-trained on the cross-lingual dataset `XGLUE <https://arxiv.org/abs/2004.01401 >`__.
686
+ XLM-ProphetNet's model architecture and pretraining objective is same as ProphetNet, but XLM-ProphetNet was pre-trained
687
+ on the cross-lingual dataset `XGLUE <https://arxiv.org/abs/2004.01401 >`__.
688
688
689
689
The library provides a pre-trained version of this model for multi-lingual conditional generation and fine-tuned
690
690
versions for headline generation and question generation, respectively.
0 commit comments