-
Notifications
You must be signed in to change notification settings - Fork 53
load/store outer optimizer state dict #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@tushar00jain has exported this pull request. If you are a Meta employee, you can view the originating diff in D83512078. |
Summary: Pull Request resolved: meta-pytorch#277 Differential Revision: D83512078
6881fe5 to
74be564
Compare
|
@tushar00jain has exported this pull request. If you are a Meta employee, you can view the originating diff in D83512078. |
74be564 to
274e866
Compare
Summary: Pull Request resolved: meta-pytorch#277 Differential Revision: D83512078
|
@tushar00jain has exported this pull request. If you are a Meta employee, you can view the originating diff in D83512078. |
274e866 to
c8eb891
Compare
Summary: We don't restore outer optimizer state currently which can lead to bumps in loss because of high learning rate from a new replica. So save the outer optimizer state in the diloco specific state dict. Differential Revision: D83512078
|
@tushar00jain has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83512078. |
Summary: We don't restore outer optimizer state currently which can lead to bumps in loss because of high learning rate from a new replica. So save the outer optimizer state in the diloco specific state dict. Reviewed By: d4l3k Differential Revision: D83512078
c8eb891 to
2e4d93a
Compare
|
This pull request has been merged in 302fd39. |
Differential Revision: D83512078