This repository is a collection of IMPORTANT deep learning research papers, organized by research area and implementation. The goal is to provide a structured approach to understanding the evolution and core concepts of deep learning.
Important
This is a personal learning project. The implementations and notes may contain errors or simplifications. Use with caution and always refer to the original papers.
Inspired by adam-maj and expanded with additional research papers and implementations.
- Implement approximately 60 important deep learning papers
- Provide scratch implementations for learning and understanding
- Create a comprehensive resource for deep learning research
- DNN (1987): Learning Internal Representations by Error Propagation pdf
- CNN (1989): Backpropagation Applied to Handwritten Zip Code Recognition pdf
- LeNet (1998): Gradient-Based Learning Applied to Document Recognition pdf
- AlexNet (2012): ImageNet Classification with Deep Convolutional Networks pdf
- U-Net (2015): Convolutional Networks for Biomedical Image Segmentation pdf
- Weight Decay (1991): A Simple Weight Decay Can Improve Generalization pdf
- ReLU (2011): Deep Sparse Rectified Neural Networks pdf
- Residuals (2015): Deep Residual Learning for Image Recognition pdf
- Dropout (2014): Preventing Neural Networks from Overfitting pdf
- BatchNorm (2015): Accelerating Deep Network Training pdf
- LayerNorm (2016): Layer Normalization pdf
- GELU (2016): Gaussian Error Linear Units pdf
- Adam (2014): Stochastic Optimization Method pdf
- RNN (1989): Continually Running Fully Recurrent Neural Networks pdf
- LSTM (1997): Long-Short Term Memory pdf
- Learning to Forget (2000): Continual Prediction with LSTM pdf
- Word2Vec (2013): Word Representations in Vector Space pdf
- Phrase2Vec (2013): Distributed Representations of Words and Phrases pdf
- Encoder-Decoder (2014): RNN Encoder-Decoder for Machine Translation pdf
- Seq2Seq (2014): Sequence to Sequence Learning pdf
- Attention (2014): Neural Machine Translation with Alignment pdf
- Mixture of Experts (2017): Sparsely-Gated Neural Networks pdf
- Transformer (2017): Attention Is All You Need pdf
- BERT (2018): Bidirectional Transformers for Language Understanding pdf
- RoBERTa (2019): Robustly Optimized BERT Pretraining pdf
- T5 (2019): Unified Text-to-Text Transformer pdf
- GPT Series:
- LoRA (2021): Low-Rank Adaptation of Large Language Models pdf
- RLHF (2019): Fine-Tuning from Human Preferences pdf
- InstructGPT (2022): Following Instructions with Human Feedback pdf
- Vision Transformer (2020): Image Recognition with Transformers pdf
- ELECTRA (2020): Discriminative Pre-training pdf
- GAN (2014): Generative Adversarial Networks pdf
- VAE (2013): Auto-Encoding Variational Bayes pdf
- VQ VAE (2017): Neural Discrete Representation Learning pdf
- Diffusion Models:
- CLIP (2021): Visual Models from Natural Language Supervision pdf
- DALL-E (2021-2022): Text-to-Image Generation pdf
- SimCLR (2020): Contrastive Learning of Visual Representations pdf
- Deep Reinforcement Learning (2017): Mastering Chess and Shogi pdf
- Deep Q-Learning (2013): Playing Atari Games pdf
- AlphaGo (2016): Mastering the Game of Go pdf
- AlphaFold (2021): Protein Structure Prediction pdf
- Deep Learning Survey (2015): By LeCun, Bengio, and Hinton pdf
- BigGAN (2018): Large Scale GAN Training pdf
- WaveNet (2016): Generative Model for Raw Audio pdf
- BERTology (2020): Survey of BERT Use Cases pdf
- Scaling Laws for Neural Language Models (2020): Predicting Model Performance pdf
- Chinchilla (2022): Training Compute-Optimal Large Language Models pdf
- Gopher (2022): Scaling Language Models with Massive Compute pdf
- P-Tuning (2021): Prompt Tuning with Soft Prompts pdf
- Prefix-Tuning (2021): Optimizing Continuous Prompts pdf
- AdaLoRA (2023): Adaptive Low-Rank Adaptation pdf
- QLoRA (2023): Efficient Fine-Tuning of Quantized Models pdf
- FlashAttention (2022): Fast and Memory-Efficient Attention pdf
- FlashAttention-2 (2023): Faster Attention Mechanism pdf
- Direct Preference Optimization (DPO) (2023): Aligning Language Models with Human Preferences pdf
- LoRA (2021): Low-Rank Adaptation of Large Language Models pdf
- Mixture of Experts (MoE) (2022): Scaling Language Models with Sparse Experts pdf
- GLaM (2021): Efficient Scaling with Mixture of Experts pdf
- Switch Transformers (2022): Scaling to Trillion Parameter Models pdf