Skip to content

How can I train step3 in DeepSpeed-Chat by pipeline parallelism? #427

Open
@GongCQ

Description

@GongCQ

If I set "CUDA_VISIBLE_DEVICES=0,1,2,3" and then execute the script "run_6.7B.sh" in step3, then the model would be trained by data parallelism, and the data parallel world size is 4. There are no other parallelism(such as pipeline parallelism and tensor parallelism).

As a result, I can not train a big model if its size is larger than the capacity of a single GPU, even though I have many GPUs. Because without pipeline or tensor parallelism, I have to load a complete model into one GPU.

So, in DeepSpeed-Chat step3, what should I do if I want to train a big model which is too large to be loaded into one GPU? Maybe I have to rewrite the module class so it inherits from deepspeed.pipe.PipelineModule?

Metadata

Metadata

Assignees

Labels

deespeed chatDeepSpeed ChatquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions