Multi-node model training

Is multi-machine training of large models suitable for multi-node large models? Secondly, can the large model be divided into blocks and allocated to each node for training? For example: Chatglm3 large model training requires four graphics cards with 48g of video memory on a single node to meet the demand. Can I use the multi-machine training method to divide the large model into two nodes with four graphics cards with 24g of video memory?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-node model training #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development