Skip to content

Transformers

toncho11 edited this page Nov 3, 2022 · 10 revisions

Transformers are an architecture introduced in 2017, used primarily in the field of NLP, that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. It relies entirely on self-attention to compute representations of its input and output WITHOUT using sequence-aligned RNNs or convolution. The main tasks these Transformers are used for are classification, information extraction, question answering, summarization, translation, text generation, etc

The most popular Transformers are BERT and GPT-2. Hugging Face has other Transformers available for you to experiment with.

Details

  • use of RNN (which allows previous outputs to be used as inputs)

  • can be used for

    • translation
    • text to summary
    • question answering
    • image captioning
    • chatbots
  • BERT is a type of Attention Model and Transformer Model

    • can produce contextualized token embeddings (representation of the input text)

Links

Clone this wiki locally