Skip to content

Get help for distributed model training on MI250 #30

@OswaldHe

Description

@OswaldHe

Hi,

I would like to test a program for distributed LLM model training on mi2508x and I want to do model parallel to distribute parameters across GPUs. Is there any framework that I should use to achieve that? I used DeepSpeed (https://github.com/microsoft/DeepSpeed), but their ZeRO stage-3 will actually increase memory consumption of all GPUs compared with ZeRO stage-2, which only do optimizer distribution. Is there any resource/recommendation and some examples specifically for AMD GPUs?

Thank you,
Zifan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions