Backward CPU offloading: Asynchronous transfer

**Is your feature request related to a problem? Please describe.**
CPU offloading in `backward` works by iterating this over layers:
- Load weights from CPU
- Run backward from head gradients
- Store gradients to CPU

These steps are currently run sequentially.

**Describe the solution you'd like**
Can we run these steps in parallel?

- Switch between two shards on GPU
- Run backward on one while loading weights for the other, and storing gradients for the previous

Needs clear understanding how async CPU <-> GPU transfer works! We know how transfer between GPUs works.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backward CPU offloading: Asynchronous transfer #62

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Backward CPU offloading: Asynchronous transfer #62

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions