Skip to content

RistoAle97/yati

Repository files navigation

🤖 Yet Another Transformer Implementation 🤖

Python Pytorch

Warning

This repository was developed for academic and personal purposes in order to better understand the underlying architecture of the Transformer and to use it for future small projects.


📦 Installation

You can install the package by either:

  • using pip

    pip install git+https://github.com/RistoAle97/yati

    This will not install the dev dependencies listed in pyproject.toml.

  • cloning the repository and installing the dependencies

    git clone https://github.com/RistoAle97/yati
    
    pip install -e yati
    pip install yati[dev]  # if you want to contribute to this project

🛠️ Implementation details

Note

Some implementation choices have been made that may differ from the original paper [1]:

  • The source and target embeddings are shared, so a unified vocabulary (one for all the languages in a NMT task to give an example) is needed.
  • The embeddings are tied to the linear output (i.e.: they share the weights).
  • Pre-normalization was employed instead of post-normalization [2].
  • Layer normalization [3] is performed at the end of both the encoder and decoder stacks.
  • There is no softmax layer as it is already used by the CrossEntropy loss function implemented in PyTorch.

Hereafter a comparison between the original transformer and the one from this repository.

Original This repository

📚 Bibliography

[1] Vaswani, Ashish, et al. "Attention is all you need.". Advances in neural information processing systems 30 (2017).

[2] Nguyen, Toan Q., and Julian Salazar. "Transformers without tears: Improving the normalization of self-attention.". arXiv preprint arXiv:1910.05895 (2019).

[3] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization". arXiv preprint arXiv:1607.06450 (2016).

Some additional nice reads:

📝 License

This project is MIT licensed.

Releases

No releases published

Packages

No packages published

Languages