Why is checkpoint averaging being deprecated? #12926
riqiang-dp
started this conversation in
General
Replies: 1 comment
-
Hi, thanks for the question, and apologies for the confusion. The reason for this change is because the legacy checkpoint conversion scripts used old NeMo checkpoint formats that have since been deprecated. The new zarr checkpoint averaging script is compatible with the latest NeMo 2.0 checkpoint format. There were also plans to add a torch dist checkpoint averaging script (see this PR), but the PR was never merged due to shifting priorities. However, you are still welcome to test it out and see if it works for you. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey guys,
I noticed that in the latest version only Zarr distributed checkpoint averaging script is left. Why? No documentation for this change. What about the regular type of training?
Beta Was this translation helpful? Give feedback.
All reactions