-
Notifications
You must be signed in to change notification settings - Fork 295
Open
Description
The annotation give a correct sample:
GPUs=16, DP=PP=TP=CP=2
8 data_parallel groups:
[g0, g4], [g1, g5], [g2, g6], [g3, g7], [g8, g12], [g9, g13], [g10, g14], [g11, g15]
8 tensor model-parallel groups:
[g0, g1], [g2, g3], [g4, g5], [g6, g7], [g8, g9], [g10, g11], [g12, g13], [g14, g15]
8 context-parallel groups:
[g0, g2], [g1, g3], [g4, g6], [g5, g7], [g8, g10], [g9, g11], [g12, g14], [g13, g15]
8 pipeline model-parallel groups:
[g0, g8], [g1, g9], [g2, g10], [g3, g11], [g4, g12], [g5, g13], [g6, g16], [g7, g15]
but after executing the `fairscale.nn.model_parallel.initialize()` actually got:
> initializing model parallel with size 2
> initializing context parallel with size 2
> initializing pipeline with size 2
> initializing ddp with size 2
data groups: [0, 8]
data groups: [1, 9]
data groups: [2, 10]
data groups: [3, 11]
data groups: [4, 12]
data groups: [5, 13]
data groups: [6, 14]
data groups: [7, 15]
model groups: [0, 1]
model groups: [2, 3]
model groups: [4, 5]
model groups: [6, 7]
model groups: [8, 9]
model groups: [10, 11]
model groups: [12, 13]
model groups: [14, 15]
pipeline groups: [0, 4]
pipeline groups: [1, 5]
pipeline groups: [2, 6]
pipeline groups: [3, 7]
pipeline groups: [8, 12]
pipeline groups: [9, 13]
pipeline groups: [10, 14]
pipeline groups: [11, 15]
context groups: [0, 2]
context groups: [1, 3]
context groups: [4, 6]
context groups: [5, 7]
context groups: [8, 10]
context groups: [9, 11]
context groups: [12, 14]
context groups: [13, 15]
I found that:
groups = torch.LongTensor(range(world_size)).reshape(data_parallel_size, pipeline_length, context_parallel_size, model_parallel_size)
data_parallel_size, pipeline_length, does have an incorrect order?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels