Skip to content

Latest commit

 

History

History
21 lines (19 loc) · 470 Bytes

File metadata and controls

21 lines (19 loc) · 470 Bytes

language-model-from-scratch

Learn how to develop language model by developing tiny language models.

Contents

  • Playing with language models by prompting
  • Data preparation
    • Tinystories taste synthetic data
    • wikipedia deta
  • Tokenizer
  • Ngram language model
  • Attention
  • Pretraining
  • Instruction finetuning
  • RLHF (RLAIF)

Setup

# install pytorch following https://pytorch.org/
# other libraries
pip3 install -r requirements.txt