Usage about grad_accum_every

I am curious about how "grad_accum_every" used in https://github.com/lucidrains/musiclm-pytorch/blob/main/musiclm_pytorch/trainer.py#L317

In my previous experience, the model basically get gradient (backward) once a step. Why should we split loss "grad_accum_every" times to get gradient in a step?

If I have gpu constrain (1 T4 gpu), that means I could only set batch size to 1 or 2 at each stage training, should I still set "grad_accum_every' to large number like 16 or 32?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Usage about grad_accum_every #59

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Usage about grad_accum_every #59

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions