Skip to content

Update deepspeed activation checkpointing docs #17621

Open
@avivbrokman

Description

@avivbrokman

📚 Documentation

In your documentation, you refer to the function deepspeed.checkpointing.checkpoint, but it looks like it does not exist (anymore?). Can you update that section?

While we're at it, can you provide a more common use-case as an example? The guide warns against wrapping an entire model, but having a pretrained language model from transformers is probably the most common use case. What if someone is just using GPT-2 or T5 with no further mathematical layers? What should get wrapped then?

cc @Borda @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions