On my fork I'm attempting to add in support for HuggingFace's OpenWebText dataset and their GPT2 tokenizer so that I can do a comparison against HF's GPT2-small. If you were willing, I'd love advice on setting up model params for self-supervised autoregressive NLP. Thanks!