Skip to content

Commit ffd675b

Browse files
add summary (#7927)
1 parent 5547b40 commit ffd675b

File tree

1 file changed

+37
-0
lines changed

1 file changed

+37
-0
lines changed

docs/source/model_summary.rst

+37
Original file line numberDiff line numberDiff line change
@@ -612,6 +612,43 @@ The `mbart-large-cc25 <https://huggingface.co/facebook/mbart-large-cc25>`_ check
612612

613613
.. _multimodal-models:
614614

615+
ProphetNet
616+
-----------------------------------------------------------------------------------------------------------------------
617+
618+
.. raw:: html
619+
620+
<a href="https://huggingface.co/models?filter=prophetnet">
621+
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-prophetnet-blueviolet">
622+
</a>
623+
<a href="model_doc/prophetnet.html">
624+
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-prophetnet-blueviolet">
625+
</a>
626+
627+
`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
628+
629+
ProphetNet introduces a novel *sequence-to-sequence* pre-training objective, called *future n-gram prediction*. In future n-gram prediction, the model predicts the next n tokens simultaneously based on previous context tokens at each time step instead instead of just the single next token. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations.
630+
The model architecture is based on the original Transformer, but replaces the "standard" self-attention mechanism in the decoder by a a main self-attention mechanism and a self and n-stream (predict) self-attention mechanism.
631+
632+
The library provides a pre-trained version of this model for conditional generation and a fine-tuned version for summarization.
633+
634+
XLM-ProphetNet
635+
-----------------------------------------------------------------------------------------------------------------------
636+
637+
.. raw:: html
638+
639+
<a href="https://huggingface.co/models?filter=xprophetnet">
640+
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xprophetnet-blueviolet">
641+
</a>
642+
<a href="model_doc/xlmprophetnet.html">
643+
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xprophetnet-blueviolet">
644+
</a>
645+
646+
`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
647+
648+
XLM-ProphetNet's model architecture and pre-training objective is same as ProphetNet, but XLM-ProphetNet was pre-trained on the cross-lingual dataset `XGLUE <https://arxiv.org/abs/2004.01401>`__.
649+
650+
The library provides a pre-trained version of this model for multi-lingual conditional generation and fine-tuned versions for headline generation and question generation, respectively.
651+
615652
Multimodal models
616653
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
617654

0 commit comments

Comments
 (0)