- Can we support distributed consumer grade GPUs ?
- https://github.com/EleutherAI/gpt-neox?tab=readme-ov-file#advanced-custom-launching
- https://app.primeintellect.ai/intelligence could allow for distributed training
- Custom data loading
- shttps://github.com/EleutherAI/gpt-neox?tab=readme-ov-file#using-custom-data
- Should we be using GPTNeoXTokenizerFast ?
- Possible next UpWork task
- Review of experiment setup
- Can TPU v2-8 demonstrate multi-GPU setups in Colab?
- Run experiment in parallel
- Separate processes in Colab, different dirs for results etc
- Reduce startup time in colab for experiments
- A colab compatible version of pgt-neox would be great
- Free TPU for research https://sites.research.google/trc/about/
- Get running on TPU and benchmark
- Shakespeare HF model should also save the tokenizer
- Unclear how to do that because not using a HF teokenizer - maybe we can ?
- Get training running with CPU (for offline dev)
- move from vvirtual env to pdm