[dataloader] dataloading improvement tracking issue #37
Open
Description
This is a tracking issue for dataloader improvements. The current support is very basic and we likely need to make some bigger changes to make this more efficient
- track dataloader step counts on a per replica_id basis
- add mechanism for reinstantiating dataloader from checkpoint and fast forwarding to the correct step count
- throw this all out and use a deterministic index managed by Lighthouse?