Skip to content

Latest commit

 

History

History
135 lines (106 loc) · 9.11 KB

README.md

File metadata and controls

135 lines (106 loc) · 9.11 KB

Deep Learning Research Paper Collection

Overview

This repository is a collection of IMPORTANT deep learning research papers, organized by research area and implementation. The goal is to provide a structured approach to understanding the evolution and core concepts of deep learning.

Disclaimer

Important

This is a personal learning project. The implementations and notes may contain errors or simplifications. Use with caution and always refer to the original papers.

Inspiration and Credits

Inspired by adam-maj and expanded with additional research papers and implementations.

Project Goals

  • Implement approximately 60 important deep learning papers
  • Provide scratch implementations for learning and understanding
  • Create a comprehensive resource for deep learning research

Contents

1. Foundational Deep Neural Networks

Papers

  • DNN (1987): Learning Internal Representations by Error Propagation pdf
  • CNN (1989): Backpropagation Applied to Handwritten Zip Code Recognition pdf
  • LeNet (1998): Gradient-Based Learning Applied to Document Recognition pdf
  • AlexNet (2012): ImageNet Classification with Deep Convolutional Networks pdf
  • U-Net (2015): Convolutional Networks for Biomedical Image Segmentation pdf

2. Optimization and Regularization Techniques

Papers

  • Weight Decay (1991): A Simple Weight Decay Can Improve Generalization pdf
  • ReLU (2011): Deep Sparse Rectified Neural Networks pdf
  • Residuals (2015): Deep Residual Learning for Image Recognition pdf
  • Dropout (2014): Preventing Neural Networks from Overfitting pdf
  • BatchNorm (2015): Accelerating Deep Network Training pdf
  • LayerNorm (2016): Layer Normalization pdf
  • GELU (2016): Gaussian Error Linear Units pdf
  • Adam (2014): Stochastic Optimization Method pdf

3. Sequence Modeling

Papers

  • RNN (1989): Continually Running Fully Recurrent Neural Networks pdf
  • LSTM (1997): Long-Short Term Memory pdf
  • Learning to Forget (2000): Continual Prediction with LSTM pdf
  • Word2Vec (2013): Word Representations in Vector Space pdf
  • Phrase2Vec (2013): Distributed Representations of Words and Phrases pdf
  • Encoder-Decoder (2014): RNN Encoder-Decoder for Machine Translation pdf
  • Seq2Seq (2014): Sequence to Sequence Learning pdf
  • Attention (2014): Neural Machine Translation with Alignment pdf
  • Mixture of Experts (2017): Sparsely-Gated Neural Networks pdf

4. Language Modeling

Papers

  • Transformer (2017): Attention Is All You Need pdf
  • BERT (2018): Bidirectional Transformers for Language Understanding pdf
  • RoBERTa (2019): Robustly Optimized BERT Pretraining pdf
  • T5 (2019): Unified Text-to-Text Transformer pdf
  • GPT Series:
    • GPT (2018): Generative Pre-Training pdf
    • GPT-2 (2018): Unsupervised Multitask Learning pdf
    • GPT-3 (2020): Few-Shot Learning pdf
    • GPT-4 (2023): Advanced Language Model pdf
  • LoRA (2021): Low-Rank Adaptation of Large Language Models pdf
  • RLHF (2019): Fine-Tuning from Human Preferences pdf
  • InstructGPT (2022): Following Instructions with Human Feedback pdf
  • Vision Transformer (2020): Image Recognition with Transformers pdf
  • ELECTRA (2020): Discriminative Pre-training pdf

5. Image Generative Modeling

Papers

  • GAN (2014): Generative Adversarial Networks pdf
  • VAE (2013): Auto-Encoding Variational Bayes pdf
  • VQ VAE (2017): Neural Discrete Representation Learning pdf
  • Diffusion Models:
    • Initial Diffusion (2015): Nonequilibrium Thermodynamics pdf
    • Denoising Diffusion (2020): Probabilistic Models pdf
    • Improved Denoising Diffusion (2021) pdf
  • CLIP (2021): Visual Models from Natural Language Supervision pdf
  • DALL-E (2021-2022): Text-to-Image Generation pdf
  • SimCLR (2020): Contrastive Learning of Visual Representations pdf

6. Deep Reinforcement Learning

Papers

  • Deep Reinforcement Learning (2017): Mastering Chess and Shogi pdf
  • Deep Q-Learning (2013): Playing Atari Games pdf
  • AlphaGo (2016): Mastering the Game of Go pdf
  • AlphaFold (2021): Protein Structure Prediction pdf

7. Additional Influential Papers

  • Deep Learning Survey (2015): By LeCun, Bengio, and Hinton pdf
  • BigGAN (2018): Large Scale GAN Training pdf
  • WaveNet (2016): Generative Model for Raw Audio pdf
  • BERTology (2020): Survey of BERT Use Cases pdf

Scaling and Model Optimization

  • Scaling Laws for Neural Language Models (2020): Predicting Model Performance pdf
  • Chinchilla (2022): Training Compute-Optimal Large Language Models pdf
  • Gopher (2022): Scaling Language Models with Massive Compute pdf

Fine-tuning and Adaptation

  • P-Tuning (2021): Prompt Tuning with Soft Prompts pdf
  • Prefix-Tuning (2021): Optimizing Continuous Prompts pdf
  • AdaLoRA (2023): Adaptive Low-Rank Adaptation pdf
  • QLoRA (2023): Efficient Fine-Tuning of Quantized Models pdf

Inference and Optimization Techniques

  • FlashAttention (2022): Fast and Memory-Efficient Attention pdf
  • FlashAttention-2 (2023): Faster Attention Mechanism pdf
  • Direct Preference Optimization (DPO) (2023): Aligning Language Models with Human Preferences pdf
  • LoRA (2021): Low-Rank Adaptation of Large Language Models pdf

Pre-training and Model Architecture

  • Mixture of Experts (MoE) (2022): Scaling Language Models with Sparse Experts pdf
  • GLaM (2021): Efficient Scaling with Mixture of Experts pdf
  • Switch Transformers (2022): Scaling to Trillion Parameter Models pdf

Reasoning and Capabilities

  • Chain of Thought Prompting (2022): Reasoning with Language Models pdf
  • Self-Consistency (2022): Improving Language Model Reasoning pdf
  • Tree of Thoughts (2023): Deliberate Problem Solving pdf

Efficiency and Compression

  • DistilBERT (2019): Distilled Version of BERT pdf
  • Knowledge Distillation (2022): Comprehensive Survey pdf
  • Pruning and Quantization Techniques (2022): Model Compression Survey pdf