You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The release versions are now available. update from the master branch to
use the minimum required versions instead.
also link the
example.deepspeedai/DeepSpeedExamples#964
---------
Signed-off-by: inkcherry <[email protected]>
Copy file name to clipboardExpand all lines: blogs/huggingface-tp/README.md
+9-5
Original file line number
Diff line number
Diff line change
@@ -48,9 +48,15 @@ Figure 2 illustrates the basic flowchart, The division of TP and ZeRO is impleme
48
48
49
49
# Usage
50
50
51
-
Although we evaluated AutoTP training with Llama2 & Llama3 models in this blog, we expect compatibility with other Hugging Face models, especially [those](https://www.deepspeed.ai/tutorials/automatic-tensor-parallelism/) previously validated with AutoTP inference. Please upgrade accelerate and transformers to the master branch. We will add their minimum version once they have release tag.
52
51
53
52
53
+
Although we evaluated AutoTP training with Llama2 & Llama3 models in this blog, we expect compatibility with other Hugging Face models, especially [those](https://www.deepspeed.ai/tutorials/automatic-tensor-parallelism/) previously validated with AutoTP inference.
54
+
55
+
**Requirements**
56
+
-`deepspeed >= 0.16.4`
57
+
-`transformers >= 4.50.1`
58
+
-`accelerate >= 1.6.0`
59
+
54
60
**Enable TP training**
55
61
56
62
Similar to ZeRO, AutoTP training is enabled using the [deepspeed configuration file](https://www.deepspeed.ai/docs/config-json/) by specifying ```[tensor_parallel][autotp_size]```.
@@ -113,12 +119,10 @@ Models saved this way can be directly used for HF format inference without inter
113
119
Saving Checkpoints remains compatible with HF transformers. Use [trainer.save_state()](https://huggingface.co/docs/transformers/v4.49.0/en/main_classes/trainer#transformers.Trainer.save_state) or set the save interval for automatic saving, which can be used to resume training.
We validated AutoTP training using supervised finetune training (SFT) task: [stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca). The original benchmark model used in this project is Llama2-7B.
121
-
125
+
We validated AutoTP training using supervised finetune training (SFT) task: [stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca). The original benchmark model used in this project is Llama2-7B. The example code is also available [here](https://github.com/deepspeedai/DeepSpeedExamples/tree/master/training/tensor_parallel)
122
126
123
127
124
128
**Training Loss curve**
@@ -216,7 +220,7 @@ The following loss curves depict SFT training, where gbs is uniformly set to 32,
216
220
217
221
# Miscellaneous
218
222
219
-
If users define their own dataloader, please ensure data consistency within ```deepspeed.utils.get_tensor_model_parallel_group()```. DeepSpeed provides basic validation functions to assist with this.
223
+
If users define their own dataloader, please ensure data consistency within ```deepspeed.utils.groups.get_tensor_model_parallel_group()```. DeepSpeed provides basic validation functions to assist with this.
220
224
221
225
Furthermore, if users are not using transformers library, you can replace the ```TensorParallel_Layer``` layer and its subclasses as needed. See ```prepare_tp_model``` function in ```unit/model_parallelism/test_autotp_training.py```. Users can also define different shard and gather for subclasses of ```TensorParallel_Layer.```
0 commit comments