Get help for distributed model training on MI250

Hi,

I would like to test a program for distributed LLM model training on mi2508x and I want to do model parallel to distribute parameters across GPUs. Is there any framework that I should use to achieve that? I used DeepSpeed (https://github.com/microsoft/DeepSpeed), but their ZeRO stage-3 will actually increase memory consumption of all GPUs compared with ZeRO stage-2, which only do optimizer distribution. Is there any resource/recommendation and some examples specifically for AMD GPUs? 

Thank you,
Zifan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get help for distributed model training on MI250 #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Get help for distributed model training on MI250 #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions