Skip to content

Latest commit

 

History

History
14 lines (11 loc) · 599 Bytes

Attention is all you need.md

File metadata and controls

14 lines (11 loc) · 599 Bytes

Terms

  • Long short-term memory
  • Attention

    In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.

  • Extended Neural GPU

Diagrams