Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
images	images
README.md	README.md
__init__.py	__init__.py
model.py	model.py
run.py	run.py

Name

Last commit message

Last commit date

Transformer Language Models

This implementation reproduces the original Transformer model architecture introduced in Attention Is All You Need. It was first applied to English–German translation on the WMT16 dataset and introduced the now-standard building blocks of modern NLP models: multi-head self-attention, layer normalization, feed-forward networks, residual connections, and positional embeddings.

While this implementation shares much of its foundation with the T5 model, it includes important differences in architecture, datasets, model sizes, and training objectives. In particular, this model uses learned absolute positional embeddings rather than relative encodings, and the training task is translation rather than general sequence-to-sequence learning.

For more information on using our Transformer implementation, visit its model page in our documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Transformer Language Models

FilesExpand file tree

transformer

Directory actions

More options

Directory actions

More options

Latest commit

History

transformer

Folders and files

parent directory

README.md

Transformer Language Models