You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
parallelize writing of layer checkpoint files across data parallel instances (#1419)
* parallelize layer checkpoints across data parallel groups
* use partition_uniform to determine start/end index values
* formatting fix
* config: add option for parallel write of layer checkpoints in pipeline stage
* yapf fixes
* enable parallel layer write according to config param
* avoid extraneous makedir when rank 0 writes all layers
Co-authored-by: Olatunji Ruwase <[email protected]>
0 commit comments