Open
Description
Base on the example, if we want to reassign the 12 experts into 8 GPUs of 2 nodes for layer1, how to reassign experts to make balance for each GPU?
There are three steps to assign experts:
Step1: inter-node balance: Divide experts into 4 groups(0-2, 3-5, 6-8, 9-11), and assign the 4 groups into 2 nodes, ensuring inter-node balance, which can be seen as a backpack problem and solved using greedy algorithms.
Step2: expert balance: Replicate the hot experts, which within each node(4,5;1,10).
Step3: intra-node balance: Pack the replicated experts to individual GPUs(intra-node) to ensure different GPUs are load-balanced, which can alse be seen as a backpack problem.
The entire process can be called as Hierarchical Load Balancing as follow:
Metadata
Metadata
Assignees
Labels
No labels