Replies: 1 comment
-
The accelerate library will do it for you https://github.com/huggingface/accelerate |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I can run the example here: https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/llama3_lora_sft_ray.yaml and after changing the storage path and modifying the storage logic, it works with scaling to 2 workers.
I wanted to verify how data is distributed across workers.
Between the Ray documentation and Ray implementation, I don't see any reference to passing in the dataset to the TorchTrainer or use of dataset shards or data loaders.
https://docs.ray.io/en/latest/train/user-guides/data-loading-preprocessing.html
Is this not using Ray Data?
How can I verify that data is being split properly between the workers?
Beta Was this translation helpful? Give feedback.
All reactions