Skip to content

Deepspeed activation PartitioningΒ #18732

Open
@LogicBaron

Description

@LogicBaron

πŸ“š Documentation

Hello,

partition_activations (bool) – Enables partition activation when used with ZeRO stage 3 and model parallelism. Still requires you to wrap your forward functions in deepspeed.checkpointing.checkpoint. See deepspeed tutorial.

Upon encountering issues with activation partitioning and after checking, I found that Deepspeed activation partitioning is not significantly related to the use of zero-3; rather, it appears that the setup of model parallelism and mpu object is crucial.

Also, it is explicitly stated that pipeline parallelism, a model parallelism method provided by Deepspeed, cannot be used in conjunction with zero-2 and zero-3 from the outset.

Additionally, in the GitHub issue referenced in the official documentation, zero-stage3 and activation partitioning are used together; however, this pairing holds no particular significance.

Therefore, it is thought that there should be clearer statements regarding the use conditions for activation partitioning, beyond simply specifying that it should be used with zero3 + mp.

cc @Borda @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions