Some people have asked me to explain what I know about transformers to them. Here's my attempt to explain things as I understand them, using the resources that I used.
- These resources / explanations are curated assuming you have a 2nd+ year university level understanding of linear
algebra and multivariable calculus. The math needed here is surprisingly thin, but whatever is actually needed
cannot be ignored.
In particular, you should be able to answer:
- Do you know what vectors are?
- Do you understand how matrix multiplication works?
- Do you know what a change of basis is?
- Do you know concepts like span, rank, nullspace, etc?
- What is an inner product space and why does it matter for embeddings?
- What is the geometric interpretation of the dot product? For calculus:
- What is a gradient?
- What is a Jacobian?
- What is a local maximum / minimum?
- What is the chain-rule, and what does it look like in the multivariable case?
- What is the exponential function? Why is it important?
- What is the difference between norms ℓ¹, ℓ², ℓ∞, and what do they measure?
- Assuming you have an understanding of machine learning in general:
- What is a loss function?
- What is gradient descent?
- What is machine learning "model"?
- Can you do linear regression? logistic regression?
- Do you understand the basics of NLP?
- Tokenization
- n-gram language models
- Word Vectors?
- I would highly recommend using LLMs to fill in any knowledge gaps that you have. It's surprisingly effective as a search engine, and it answers your exact question instead of pointing you to resources which may or may not be helpful.
These are roughly in the order that I would recommend you view them.
- 3Blue1Brown's youtube videos on Transformers are the best high-production-value and watchable
introductions to GPTs in general.
- https://www.youtube.com/watch?v=wjZofJX0v4M [Essential]
- https://www.youtube.com/watch?v=eMlx5fFNoYc [Essential]
- https://www.youtube.com/watch?v=9-Jl0dxWQs8 [Optional]
- Skim + watch + read these two:
- A mathematical framework for transformer circuits (more intuition!):
https://transformer-circuits.pub/2021/framework/index.html
- Focus especially on the first diagram.
- Neel Nanda's walkthrough of the above: https://www.youtube.com/watch?v=KV5gbOmHbjU
- A mathematical framework for transformer circuits (more intuition!):
https://transformer-circuits.pub/2021/framework/index.html
- Andrej Karpathy's videos are great for a bit later, when you're following along with implementation. This is a gold mine if 1) you're not sure what you want to implement, or 2) you are running into
some accuracy bottleneck you'd like to overcome.
- trying to understand the nuances of things like initialization, BatchNorm, etc.
- If you want to speedrun this, you can probably skip
- The sections on implementing backprop / autograd
- The WaveNet video
- You can probably just skim the let's build GPT video (#7 in the playlist)
- I would start with video #8, the State of GPT.
- The last video is not really necessary -- by that time you should be mostly self-sufficient.
Please feel free to leave suggestions in Issues or questions in Discussions, and I will do my best to answer them! Let's work together to make this as useful an introduction to LLMs as possible.