-
Notifications
You must be signed in to change notification settings - Fork 419
[Diloco] add diloco related utils #2552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
534c6c9 to
6dc6108
Compare
6dc6108 to
b7e0d76
Compare
b7e0d76 to
868bb4e
Compare
|
🤖 Hi @khatwanimohit, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
868bb4e to
5494165
Compare
|
🤖 Hi @khatwanimohit, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
This pull request introduces Distributed Low-Communication (DiLoCo) training, a technique for efficient distributed training of large models. The implementation looks solid and is accompanied by a comprehensive unit test. The changes to configuration and utility functions are appropriate for integrating this new feature.
🔍 General Feedback
- The core logic in
src/MaxText/diloco.pyis well-structured and follows the principles outlined in the referenced papers. - The addition of a detailed unit test in
tests/diloco_test.pyis excellent and greatly helps in verifying the correctness of the implementation. - One potential issue was identified in the sharding logic, for which a suggestion has been provided.
Overall, this is a great contribution that adds a valuable feature to MaxText.
c5281c1 to
3aac2cc
Compare
|
🤖 Hi @khatwanimohit, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
This Pull Request introduces Distributed Low-Communication (DiLoCo) training utilities and integrates them into the MaxText configuration. The implementation appears sound, and the accompanying unit tests provide good coverage for the core functionality.
🔍 General Feedback
- The addition of
drjaxand other dependency updates are appropriate for the new feature. - The configuration changes in
base.ymlandpyconfig.pycorrectly expose and handle the new DiLoCo parameters. - The new
diloco.pymodule is well-structured and implements the DiLoCo algorithm effectively. - The
diloco_test.pyprovides a thorough simulation of the DiLoCo training process, with clear explanations of expected values.
3aac2cc to
4a04bd3
Compare
4a04bd3 to
d968f41
Compare
d968f41 to
52bb4d4
Compare
Description
All the credit of this PR goes to @jonb377 and @ZacharyGarrett !
This PR introduces Distributed Low-Communication (DiLoCo) training, a technique to reduce communication overhead in
distributed model training. It achieves this by synchronizing model parameters periodically, rather than at every step,
improving efficiency for large models.
Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.
Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.
Tests
a unit test with a simple model, more tests with trainer will come in an upcoming PR
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.