Skip to content

Commit 06e00f6

Browse files
authored
Fix typos: deepseed -> deepspeed (#2499)
1 parent f2710bb commit 06e00f6

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

deepspeed/runtime/pipe/module.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ def forward(self, inputs):
119119
Args:
120120
layers (Iterable): A sequence of layers defining pipeline structure. Can be a ``torch.nn.Sequential`` module.
121121
num_stages (int, optional): The degree of pipeline parallelism. If not specified, ``topology`` must be provided.
122-
topology (``deepseed.runtime.pipe.ProcessTopology``, optional): Defines the axes of parallelism axes for training. Must be provided if ``num_stages`` is ``None``.
122+
topology (``deepspeed.runtime.pipe.ProcessTopology``, optional): Defines the axes of parallelism axes for training. Must be provided if ``num_stages`` is ``None``.
123123
loss_fn (callable, optional): Loss is computed ``loss = loss_fn(outputs, label)``
124124
base_seed (int, optional): [description]. Defaults to 1234.
125125
partition_method (str, optional): [description]. Defaults to 'parameters'.

deepspeed/runtime/zero/partition_parameters.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -605,7 +605,7 @@ def __init__(self,
605605
606606
.. note::
607607
Initializes ``deepspeed.comm`` if it has not already been done so.
608-
See :meth:`deepseed.init_distributed` for more information.
608+
See :meth:`deepspeed.init_distributed` for more information.
609609
610610
.. note::
611611
Can also be used as a decorator:

docs/_tutorials/large-models-w-deepspeed.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ At a broad level, there are two primary paths to training a large model:
2424

2525
Since, ZeRO is a replacement to data parallelism, it offers a seamless integration that does not require model code refactoring for existing data-parallel models. For majority of cases, ZeRO based technologies offers model scalability, training throughput efficiency without compromising ease of use.
2626

27-
**3D Parallelism based technologies**: 3D Parallelism refers to a combination of three different forms of parallel technologies namely tensor-slicing, pipeline-parallelism, and data parallelism (or ZeRO powered data parallelism). Combing these three forms allows for harnessing the strength of each of these technologies without the drawback of any. 3D Parallelism enables DeepSeed to achieve excellent training throughput efficiency in the scenarios where relying on ZeRO based technologies alone might be insufficient. However, 3D parallelism requires non-trivial model code refactoring, and therefore a careful consideration is important to identify cases where 3D-Parallelism can bring non-trivial throughput benefits.
27+
**3D Parallelism based technologies**: 3D Parallelism refers to a combination of three different forms of parallel technologies namely tensor-slicing, pipeline-parallelism, and data parallelism (or ZeRO powered data parallelism). Combing these three forms allows for harnessing the strength of each of these technologies without the drawback of any. 3D Parallelism enables DeepSpeed to achieve excellent training throughput efficiency in the scenarios where relying on ZeRO based technologies alone might be insufficient. However, 3D parallelism requires non-trivial model code refactoring, and therefore a careful consideration is important to identify cases where 3D-Parallelism can bring non-trivial throughput benefits.
2828

2929
## Deciding which technology to use
3030

0 commit comments

Comments
 (0)