How to run such a memory intensive model

May I ask how you managed to run this memory intensive model. Did you use 3D parallelism?
When I was using DeepSpeed's pipeline for parallel processing, I found that due to the **Batch** class of the model itself, it was not very suitable for the input and output of the DeepSpeed pipeline model (I consulted DeepSpeed's official documentation, and I found that the pipeline model requires  the inputs and outputs of each layer must be either a single torch.Tensor or a tuple of tensors).
What approach would you recommend in this case?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to run such a memory intensive model #174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to run such a memory intensive model #174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions