Explanation + resources for understanding the GPT architecture

Some people have asked me to explain what I know about transformers to them. Here's my attempt to explain things as I understand them, using the resources that I used.

Assumptions / Prerequisites

These resources / explanations are curated assuming you have a 2nd+ year university level understanding of linear algebra and multivariable calculus. The math needed here is surprisingly thin, but whatever is actually needed cannot be ignored. In particular, you should be able to answer:
- Do you know what vectors are?
- Do you understand how matrix multiplication works?
- Do you know what a change of basis is?
- Do you know concepts like span, rank, nullspace, etc?
- What is an inner product space and why does it matter for embeddings?
- What is the geometric interpretation of the dot product? For calculus:
- What is a gradient?
- What is a Jacobian?
- What is a local maximum / minimum?
- What is the chain-rule, and what does it look like in the multivariable case?
- What is the exponential function? Why is it important?
- What is the difference between norms ℓ¹, ℓ², ℓ∞, and what do they measure?
Assuming you have an understanding of machine learning in general:
- What is a loss function?
- What is gradient descent?
- What is machine learning "model"?
- Can you do linear regression? logistic regression?
- Do you understand the basics of NLP?
  - Tokenization
  - n-gram language models
  - Word Vectors?

Tips

I would highly recommend using LLMs to fill in any knowledge gaps that you have. It's surprisingly effective as a search engine, and it answers your exact question instead of pointing you to resources which may or may not be helpful.

Resources

These are roughly in the order that I would recommend you view them.

3Blue1Brown's youtube videos on Transformers are the best high-production-value and watchable introductions to GPTs in general.
- https://www.youtube.com/watch?v=wjZofJX0v4M [Essential]
- https://www.youtube.com/watch?v=eMlx5fFNoYc [Essential]
- https://www.youtube.com/watch?v=9-Jl0dxWQs8 [Optional]
Skim + watch + read these two:
- A mathematical framework for transformer circuits (more intuition!): https://transformer-circuits.pub/2021/framework/index.html
  - Focus especially on the first diagram.
- Neel Nanda's walkthrough of the above: https://www.youtube.com/watch?v=KV5gbOmHbjU
Andrej Karpathy's videos are great for a bit later, when you're following along with implementation. This is a gold mine if 1) you're not sure what you want to implement, or 2) you are running into some accuracy bottleneck you'd like to overcome.
- trying to understand the nuances of things like initialization, BatchNorm, etc.
- If you want to speedrun this, you can probably skip
  - The sections on implementing backprop / autograd
  - The WaveNet video
  - You can probably just skim the let's build GPT video (#7 in the playlist)
- I would start with video #8, the State of GPT.
- The last video is not really necessary -- by that time you should be mostly self-sufficient.

Contributions welcome

Please feel free to leave suggestions in Issues or questions in Discussions, and I will do my best to answer them! Let's work together to make this as useful an introduction to LLMs as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
lesson0.md		lesson0.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Explanation + resources for understanding the GPT architecture

Assumptions / Prerequisites

Tips

Resources

Contributions welcome

About

Uh oh!

Releases

Packages

zoravur/gpt_tutorial

Folders and files

Latest commit

History

Repository files navigation

Explanation + resources for understanding the GPT architecture

Assumptions / Prerequisites

Tips

Resources

Contributions welcome

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages