Deep Learning Research Paper Collection

Overview

This repository is a collection of IMPORTANT deep learning research papers, organized by research area and implementation. The goal is to provide a structured approach to understanding the evolution and core concepts of deep learning.

Disclaimer

Important

This is a personal learning project. The implementations and notes may contain errors or simplifications. Use with caution and always refer to the original papers.

Inspiration and Credits

Inspired by adam-maj and expanded with additional research papers and implementations.

Project Goals

Implement approximately 60 important deep learning papers
Provide scratch implementations for learning and understanding
Create a comprehensive resource for deep learning research

DNN (1987): Learning Internal Representations by Error Propagation pdf
CNN (1989): Backpropagation Applied to Handwritten Zip Code Recognition pdf
LeNet (1998): Gradient-Based Learning Applied to Document Recognition pdf
AlexNet (2012): ImageNet Classification with Deep Convolutional Networks pdf
U-Net (2015): Convolutional Networks for Biomedical Image Segmentation pdf

2. Optimization and Regularization Techniques

Papers

Weight Decay (1991): A Simple Weight Decay Can Improve Generalization pdf
ReLU (2011): Deep Sparse Rectified Neural Networks pdf
Residuals (2015): Deep Residual Learning for Image Recognition pdf
Dropout (2014): Preventing Neural Networks from Overfitting pdf
BatchNorm (2015): Accelerating Deep Network Training pdf
LayerNorm (2016): Layer Normalization pdf
GELU (2016): Gaussian Error Linear Units pdf
Adam (2014): Stochastic Optimization Method pdf

3. Sequence Modeling

Papers

RNN (1989): Continually Running Fully Recurrent Neural Networks pdf
LSTM (1997): Long-Short Term Memory pdf
Learning to Forget (2000): Continual Prediction with LSTM pdf
Word2Vec (2013): Word Representations in Vector Space pdf
Phrase2Vec (2013): Distributed Representations of Words and Phrases pdf
Encoder-Decoder (2014): RNN Encoder-Decoder for Machine Translation pdf
Seq2Seq (2014): Sequence to Sequence Learning pdf
Attention (2014): Neural Machine Translation with Alignment pdf
Mixture of Experts (2017): Sparsely-Gated Neural Networks pdf

4. Language Modeling

Papers

Transformer (2017): Attention Is All You Need pdf
BERT (2018): Bidirectional Transformers for Language Understanding pdf
RoBERTa (2019): Robustly Optimized BERT Pretraining pdf
T5 (2019): Unified Text-to-Text Transformer pdf
GPT Series:
- GPT (2018): Generative Pre-Training pdf
- GPT-2 (2018): Unsupervised Multitask Learning pdf
- GPT-3 (2020): Few-Shot Learning pdf
- GPT-4 (2023): Advanced Language Model pdf
LoRA (2021): Low-Rank Adaptation of Large Language Models pdf
RLHF (2019): Fine-Tuning from Human Preferences pdf
InstructGPT (2022): Following Instructions with Human Feedback pdf
Vision Transformer (2020): Image Recognition with Transformers pdf
ELECTRA (2020): Discriminative Pre-training pdf

5. Image Generative Modeling

Papers

GAN (2014): Generative Adversarial Networks pdf
VAE (2013): Auto-Encoding Variational Bayes pdf
VQ VAE (2017): Neural Discrete Representation Learning pdf
Diffusion Models:
- Initial Diffusion (2015): Nonequilibrium Thermodynamics pdf
- Denoising Diffusion (2020): Probabilistic Models pdf
- Improved Denoising Diffusion (2021) pdf
CLIP (2021): Visual Models from Natural Language Supervision pdf
DALL-E (2021-2022): Text-to-Image Generation pdf
SimCLR (2020): Contrastive Learning of Visual Representations pdf

6. Deep Reinforcement Learning

Papers

Deep Reinforcement Learning (2017): Mastering Chess and Shogi pdf
Deep Q-Learning (2013): Playing Atari Games pdf
AlphaGo (2016): Mastering the Game of Go pdf
AlphaFold (2021): Protein Structure Prediction pdf

7. Additional Influential Papers

Deep Learning Survey (2015): By LeCun, Bengio, and Hinton pdf
BigGAN (2018): Large Scale GAN Training pdf
WaveNet (2016): Generative Model for Raw Audio pdf
BERTology (2020): Survey of BERT Use Cases pdf

Scaling and Model Optimization

Scaling Laws for Neural Language Models (2020): Predicting Model Performance pdf
Chinchilla (2022): Training Compute-Optimal Large Language Models pdf
Gopher (2022): Scaling Language Models with Massive Compute pdf

Fine-tuning and Adaptation

P-Tuning (2021): Prompt Tuning with Soft Prompts pdf
Prefix-Tuning (2021): Optimizing Continuous Prompts pdf
AdaLoRA (2023): Adaptive Low-Rank Adaptation pdf
QLoRA (2023): Efficient Fine-Tuning of Quantized Models pdf

Inference and Optimization Techniques

FlashAttention (2022): Fast and Memory-Efficient Attention pdf
FlashAttention-2 (2023): Faster Attention Mechanism pdf
Direct Preference Optimization (DPO) (2023): Aligning Language Models with Human Preferences pdf
LoRA (2021): Low-Rank Adaptation of Large Language Models pdf

Pre-training and Model Architecture

Mixture of Experts (MoE) (2022): Scaling Language Models with Sparse Experts pdf
GLaM (2021): Efficient Scaling with Mixture of Experts pdf
Switch Transformers (2022): Scaling to Trillion Parameter Models pdf

Reasoning and Capabilities

Chain of Thought Prompting (2022): Reasoning with Language Models pdf
Self-Consistency (2022): Improving Language Model Reasoning pdf
Tree of Thoughts (2023): Deliberate Problem Solving pdf

Efficiency and Compression

DistilBERT (2019): Distilled Version of BERT pdf
Knowledge Distillation (2022): Comprehensive Survey pdf
Pruning and Quantization Techniques (2022): Model Compression Survey pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deep Learning Research Paper Collection

Overview

Disclaimer

Inspiration and Credits

Project Goals

Contents

1. Foundational Deep Neural Networks

Papers

2. Optimization and Regularization Techniques

Papers

3. Sequence Modeling

Papers

4. Language Modeling

Papers

5. Image Generative Modeling

Papers

6. Deep Reinforcement Learning

Papers

7. Additional Influential Papers

Scaling and Model Optimization

Fine-tuning and Adaptation

Inference and Optimization Techniques

Pre-training and Model Architecture

Reasoning and Capabilities

Efficiency and Compression

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deep Learning Research Paper Collection

Overview

Disclaimer

Inspiration and Credits

Project Goals

Contents

1. Foundational Deep Neural Networks

Papers

2. Optimization and Regularization Techniques

Papers

3. Sequence Modeling

Papers

4. Language Modeling

Papers

5. Image Generative Modeling

Papers

6. Deep Reinforcement Learning

Papers

7. Additional Influential Papers

Scaling and Model Optimization

Fine-tuning and Adaptation

Inference and Optimization Techniques

Pre-training and Model Architecture

Reasoning and Capabilities

Efficiency and Compression