Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py

The annotation give a correct sample:
GPUs=16, DP=PP=TP=CP=2
```
  8 data_parallel groups:
      [g0, g4], [g1, g5], [g2, g6], [g3, g7], [g8, g12], [g9, g13], [g10, g14], [g11, g15]
  8 tensor model-parallel groups:
      [g0, g1], [g2, g3], [g4, g5], [g6, g7], [g8, g9], [g10, g11], [g12, g13], [g14, g15]
  8 context-parallel groups:
      [g0, g2], [g1, g3], [g4, g6], [g5, g7], [g8, g10], [g9, g11], [g12, g14], [g13, g15]
  8 pipeline model-parallel groups:
      [g0, g8], [g1, g9], [g2, g10], [g3, g11], [g4, g12], [g5, g13], [g6, g16], [g7, g15]
      
```
     but after executing the `fairscale.nn.model_parallel.initialize()` actually got:
     
```
> initializing model parallel with size 2
> initializing context parallel with size 2
> initializing pipeline with size 2
> initializing ddp with size 2
data groups: [0, 8]
data groups: [1, 9]
data groups: [2, 10]
data groups: [3, 11]
data groups: [4, 12]
data groups: [5, 13]
data groups: [6, 14]
data groups: [7, 15]
model groups: [0, 1]
model groups: [2, 3]
model groups: [4, 5]
model groups: [6, 7]
model groups: [8, 9]
model groups: [10, 11]
model groups: [12, 13]
model groups: [14, 15]
pipeline groups: [0, 4]
pipeline groups: [1, 5]
pipeline groups: [2, 6]
pipeline groups: [3, 7]
pipeline groups: [8, 12]
pipeline groups: [9, 13]
pipeline groups: [10, 14]
pipeline groups: [11, 15]
context groups: [0, 2]
context groups: [1, 3]
context groups: [4, 6]
context groups: [5, 7]
context groups: [8, 10]
context groups: [9, 11]
context groups: [12, 14]
context groups: [13, 15]
```

I found that:
```
groups = torch.LongTensor(range(world_size)).reshape(data_parallel_size, pipeline_length, context_parallel_size, model_parallel_size)
```
data_parallel_size, pipeline_length, does have an incorrect order?
     
     

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py #1189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py #1189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions