Create a SingleRunner

**Current:**
Currently, we have a distributed runner. This is great for training runs on multi-gpu machines. 

**Idea:**
However, for scale research experiments on ~1B parameter models, we want a single runner, which only uses 1 gpu. 
This will enable new researcher that are going into this field to experiment with algorithms on their general purpose laptop or desktops. 

**Reward**
250K relign