You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose there are more than one compute nodes, and each node's GPU memory in total is enough to contain a model. What is the recommend way to configure DeepSpeed if I want to use zero-3 within each node and use data parallel between these nodes? Thanks!