Description
π Documentation
Hello,
partition_activations (bool) β Enables partition activation when used with ZeRO stage 3 and model parallelism. Still requires you to wrap your forward functions in deepspeed.checkpointing.checkpoint. See deepspeed tutorial.
Upon encountering issues with activation partitioning and after checking, I found that Deepspeed activation partitioning is not significantly related to the use of zero-3; rather, it appears that the setup of model parallelism and mpu object is crucial.
Also, it is explicitly stated that pipeline parallelism, a model parallelism method provided by Deepspeed, cannot be used in conjunction with zero-2 and zero-3 from the outset.
Additionally, in the GitHub issue referenced in the official documentation, zero-stage3 and activation partitioning are used together; however, this pairing holds no particular significance.
Therefore, it is thought that there should be clearer statements regarding the use conditions for activation partitioning, beyond simply specifying that it should be used with zero3 + mp.
cc @Borda @awaelchli