Skip to content

Multi-node model training #30

Open
@pipijiev12

Description

Is multi-machine training of large models suitable for multi-node large models? Secondly, can the large model be divided into blocks and allocated to each node for training? For example: Chatglm3 large model training requires four graphics cards with 48g of video memory on a single node to meet the demand. Can I use the multi-machine training method to divide the large model into two nodes with four graphics cards with 24g of video memory?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions