The main goal of this small project is to educate myself on how things are built from scratch, and I hope to convince at least a single person that they could build anything from scratch. Andrej Karpathy's llm.c and micrograd were the projects that motivated me to build this.
- Multi-dimensional arrays and tensors are just simple 1-dimensional arrays but with strides enabling us to access rows and columns in the desired way.
- Learned a lot about the C language, including memory management, parallel processing, and memory access patterns. This is just the second thing I built in C, the first one being a basic password manager.
- Derived backpropagation of layers like LayerNorm and Attention mechanisms. Improved my mathematical ability a lot.
- Learned about how we could map files and use them as a sort of virtual memory (it was hard storing the activations and parameters in the RAM. They are humongous, something like ~20GB).
It was fun building something like this.
tokenize the training data (needs tiktoken):
pip install tiktoken
python3 prep_data.py
compile and run:
# macOS (uses Apple Accelerate for fast matmul)
gcc -O3 -DACCELERATE_NEW_LAPACK -o train ai.c -lm -framework Accelerate
# linux with openmp
gcc -O3 -march=native -funroll-loops -fopenmp -o train ai.c -lm
./train
decode generated tokens:
python3 decode.py "464,1182,286,..."
-O3: Aggressive optimizations-march=native: CPU-specific optimizations-funroll-loops: Loop unrolling for potential speed improvements-fopenmp: OpenMP support for parallel processing-framework Accelerate: Apple's BLAS for fast matrix multiplication (macOS only)-DACCELERATE_NEW_LAPACK: use the updated cblas interface on macOS
This implementation isn't the most optimal approach; there are lots of things to improve.
- Improve the Matrix Multiplication.
- Improve the Attention Mechanism and its backprop, as it consumes a lot of training time.