You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: deepspeed/runtime/pipe/module.py
+34-31
Original file line number
Diff line number
Diff line change
@@ -83,6 +83,40 @@ def __init__(self,
83
83
84
84
85
85
classPipelineModule(nn.Module):
86
+
"""Modules to be parallelized with pipeline parallelism.
87
+
88
+
The key constraint that enables pipeline parallelism is the
89
+
representation of the forward pass as a sequence of layers
90
+
and the enforcement of a simple interface between them. The
91
+
forward pass is implicitly defined by the module ``layers``. The key
92
+
assumption is that the output of each layer can be directly fed as
93
+
input to the next, like a ``torch.nn.Sequence``. The forward pass is
94
+
implicitly:
95
+
96
+
.. code-block:: python
97
+
98
+
def forward(self, inputs):
99
+
x = inputs
100
+
for layer in self.layers:
101
+
x = layer(x)
102
+
return x
103
+
104
+
.. note::
105
+
Pipeline parallelism is not compatible with ZeRO-2 and ZeRO-3.
106
+
107
+
Args:
108
+
layers (Iterable): A sequence of layers defining pipeline structure. Can be a ``torch.nn.Sequential`` module.
109
+
num_stages (int, optional): The degree of pipeline parallelism. If not specified, ``topology`` must be provided.
110
+
topology (``deepspeed.runtime.pipe.ProcessTopology``, optional): Defines the axes of parallelism axes for training. Must be provided if ``num_stages`` is ``None``.
111
+
loss_fn (callable, optional): Loss is computed ``loss = loss_fn(outputs, label)``
112
+
seed_layers(bool, optional): Use a different seed for each layer. Defaults to False.
113
+
seed_fn(type, optional): The custom seed generating function. Defaults to random seed generator.
114
+
base_seed (int, optional): The starting seed. Defaults to 1234.
115
+
partition_method (str, optional): The method upon which the layers are partitioned. Defaults to 'parameters'.
116
+
activation_checkpoint_interval (int, optional): The granularity activation checkpointing in terms of number of layers. 0 disables activation checkpointing.
117
+
activation_checkpoint_func (callable, optional): The function to use for activation checkpointing. Defaults to ``deepspeed.checkpointing.checkpoint``.
118
+
checkpointable_layers(list, optional): Checkpointable layers may not be checkpointed. Defaults to None which does not additional filtering.
"""Modules to be parallelized with pipeline parallelism.
99
-
100
-
The key constraint that enables pipeline parallelism is the
101
-
representation of the forward pass as a sequence of layers
102
-
and the enforcement of a simple interface between them. The
103
-
forward pass is implicitly defined by the module ``layers``. The key
104
-
assumption is that the output of each layer can be directly fed as
105
-
input to the next, like a ``torch.nn.Sequence``. The forward pass is
106
-
implicitly:
107
-
108
-
.. code-block:: python
109
-
110
-
def forward(self, inputs):
111
-
x = inputs
112
-
for layer in self.layers:
113
-
x = layer(x)
114
-
return x
115
-
116
-
.. note::
117
-
Pipeline parallelism is not compatible with ZeRO-2 and ZeRO-3.
118
-
119
-
Args:
120
-
layers (Iterable): A sequence of layers defining pipeline structure. Can be a ``torch.nn.Sequential`` module.
121
-
num_stages (int, optional): The degree of pipeline parallelism. If not specified, ``topology`` must be provided.
122
-
topology (``deepspeed.runtime.pipe.ProcessTopology``, optional): Defines the axes of parallelism axes for training. Must be provided if ``num_stages`` is ``None``.
123
-
loss_fn (callable, optional): Loss is computed ``loss = loss_fn(outputs, label)``
124
-
base_seed (int, optional): [description]. Defaults to 1234.
125
-
partition_method (str, optional): [description]. Defaults to 'parameters'.
126
-
activation_checkpoint_interval (int, optional): The granularity activation checkpointing in terms of number of layers. 0 disables activation checkpointing.
127
-
activation_checkpoint_func (callable, optional): The function to use for activation checkpointing. Defaults to ``deepspeed.checkpointing.checkpoint``.
0 commit comments