Skip to content

Backward CPU offloading: Asynchronous transfer #62

@mseeger

Description

@mseeger

Is your feature request related to a problem? Please describe.
CPU offloading in backward works by iterating this over layers:

  • Load weights from CPU
  • Run backward from head gradients
  • Store gradients to CPU

These steps are currently run sequentially.

Describe the solution you'd like
Can we run these steps in parallel?

  • Switch between two shards on GPU
  • Run backward on one while loading weights for the other, and storing gradients for the previous

Needs clear understanding how async CPU <-> GPU transfer works! We know how transfer between GPUs works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions