Open
Description
Hey, I have one doubt on gradient accumulation parameter. When I increase the parameter from 4 to 8, I am getting OOM error which doesn't make much sense to me. I just wanna ask why am I getting this error? Are you storing the gradients individually or summing them as we get new ones? Or Am i missing something else?
Activity