Hi, many thanks for this project and code.
I wander how to do distribute training with large scale dataset? Which framework do you use(like Megatron for pytorch)?Currently we havn't found such a good framework. It would be happy if you could recommend one.
Thanks