Open
Description
Is multi-machine training of large models suitable for multi-node large models? Secondly, can the large model be divided into blocks and allocated to each node for training? For example: Chatglm3 large model training requires four graphics cards with 48g of video memory on a single node to meet the demand. Can I use the multi-machine training method to divide the large model into two nodes with four graphics cards with 24g of video memory?
Metadata
Assignees
Labels
No labels
Activity