You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using explicit GPU upcast for ZeRO-Offload (#6962)
Following discussion in
[PR-6670](#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.
The results on 3B model are shown below:
| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |
0 commit comments