Skip to content

Commit 753e822

Browse files
Merge remote-tracking branch 'upstream/main' into HEAD
2 parents 761b718 + 23896c3 commit 753e822

File tree

95 files changed

+1471
-790
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+1471
-790
lines changed

docs/source/api_ref_modules.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,10 @@ model specific tokenizers.
4848
:toctree: generated/
4949
:nosignatures:
5050

51-
tokenizers.SentencePieceBaseTokenizer
52-
tokenizers.TikTokenBaseTokenizer
53-
tokenizers.ModelTokenizer
54-
tokenizers.BaseTokenizer
51+
transforms.tokenizers.SentencePieceBaseTokenizer
52+
transforms.tokenizers.TikTokenBaseTokenizer
53+
transforms.tokenizers.ModelTokenizer
54+
transforms.tokenizers.BaseTokenizer
5555

5656
Tokenizer Utilities
5757
-------------------
@@ -61,8 +61,8 @@ These are helper methods that can be used by any tokenizer.
6161
:toctree: generated/
6262
:nosignatures:
6363

64-
tokenizers.tokenize_messages_no_special_tokens
65-
tokenizers.parse_hf_tokenizer_json
64+
transforms.tokenizers.tokenize_messages_no_special_tokens
65+
transforms.tokenizers.parse_hf_tokenizer_json
6666

6767

6868
PEFT Components

docs/source/api_ref_rlhf.rst

-1
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,3 @@ Components and losses for RLHF algorithms like PPO and DPO.
1616
loss.PPOLoss
1717
loss.DPOLoss
1818
loss.RSOLoss
19-
loss.SimPOLoss

docs/source/basics/custom_components.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ our models in torchtune - see :func:`~torchtune.models.llama3_2_vision.llama3_2_
117117
#
118118
from torchtune.datasets import SFTDataset, PackedDataset
119119
from torchtune.data import InputOutputToMessages
120-
from torchtune.modules.tokenizers import ModelTokenizer
120+
from torchtune.modules.transforms.tokenizers import ModelTokenizer
121121
122122
# Example builder function for a custom code instruct dataset not in torchtune, but using
123123
# different dataset building blocks from torchtune

docs/source/basics/model_transforms.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ The following methods are required on the model transform:
101101

102102
.. code-block:: python
103103
104-
from torchtune.modules.tokenizers import ModelTokenizer
104+
from torchtune.modules.transforms.tokenizers import ModelTokenizer
105105
from torchtune.modules.transforms import Transform
106106
107107
class MyMultimodalTransform(ModelTokenizer, Transform):

docs/source/basics/tokenizers.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ For example, here we change the ``"<|begin_of_text|>"`` and ``"<|end_of_text|>"`
168168
Base tokenizers
169169
---------------
170170

171-
:class:`~torchtune.modules.tokenizers.BaseTokenizer` are the underlying byte-pair encoding modules that perform the actual raw string to token ID conversion and back.
171+
:class:`~torchtune.modules.transforms.tokenizers.BaseTokenizer` are the underlying byte-pair encoding modules that perform the actual raw string to token ID conversion and back.
172172
In torchtune, they are required to implement ``encode`` and ``decode`` methods, which are called by the :ref:`model_tokenizers` to convert
173173
between raw text and token IDs.
174174

@@ -202,13 +202,13 @@ between raw text and token IDs.
202202
"""
203203
pass
204204
205-
If you load any :ref:`model_tokenizers`, you can see that it calls its underlying :class:`~torchtune.modules.tokenizers.BaseTokenizer`
205+
If you load any :ref:`model_tokenizers`, you can see that it calls its underlying :class:`~torchtune.modules.transforms.tokenizers.BaseTokenizer`
206206
to do the actual encoding and decoding.
207207

208208
.. code-block:: python
209209
210210
from torchtune.models.mistral import mistral_tokenizer
211-
from torchtune.modules.tokenizers import SentencePieceBaseTokenizer
211+
from torchtune.modules.transforms.tokenizers import SentencePieceBaseTokenizer
212212
213213
m_tokenizer = mistral_tokenizer("/tmp/Mistral-7B-v0.1/tokenizer.model")
214214
# Mistral uses SentencePiece for its underlying BPE
@@ -227,7 +227,7 @@ to do the actual encoding and decoding.
227227
Model tokenizers
228228
----------------
229229

230-
:class:`~torchtune.modules.tokenizers.ModelTokenizer` are specific to a particular model. They are required to implement the ``tokenize_messages`` method,
230+
:class:`~torchtune.modules.transforms.tokenizers.ModelTokenizer` are specific to a particular model. They are required to implement the ``tokenize_messages`` method,
231231
which converts a list of Messages into a list of token IDs.
232232

233233
.. code-block:: python
@@ -259,7 +259,7 @@ is because they add all the necessary special tokens or prompt templates require
259259
.. code-block:: python
260260
261261
from torchtune.models.mistral import mistral_tokenizer
262-
from torchtune.modules.tokenizers import SentencePieceBaseTokenizer
262+
from torchtune.modules.transforms.tokenizers import SentencePieceBaseTokenizer
263263
from torchtune.data import Message
264264
265265
m_tokenizer = mistral_tokenizer("/tmp/Mistral-7B-v0.1/tokenizer.model")

docs/source/recipes/dpo.rst

-2
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,6 @@ To use any of these, simply use the ``loss`` config entry or flag through the :r
5656
loss=torchtune.modules.loss.RSOLoss \
5757
gamma=0.5
5858
59-
.. todo (@SalmanMohammadi) point to an example repo for SimPO
60-
6159
For a deeper understanding of the different levers you can pull when using this recipe,
6260
see our documentation for the different PEFT training paradigms we support:
6361

docs/source/tutorials/e2e_flow.rst

+8-6
Original file line numberDiff line numberDiff line change
@@ -275,18 +275,20 @@ Let's first copy over the config to our local working directory so we can make c
275275
276276
$ tune cp generation ./custom_generation_config.yaml
277277
Copied file to custom_generation_config.yaml
278+
$ mkdir /tmp/torchtune/llama3_2_3B/lora_single_device/out
278279
279280
Let's modify ``custom_generation_config.yaml`` to include the following changes. Again, you only need
280281
to replace two fields: ``output_dir`` and ``checkpoint_files``
281282

282283
.. code-block:: yaml
283284
284-
output_dir: /tmp/torchtune/llama3_2_3B/lora_single_device/epoch_0
285+
checkpoint_dir: /tmp/torchtune/llama3_2_3B/lora_single_device/epoch_0
286+
output_dir: /tmp/torchtune/llama3_2_3B/lora_single_device/out
285287
286288
# Tokenizer
287289
tokenizer:
288290
_component_: torchtune.models.llama3.llama3_tokenizer
289-
path: ${output_dir}/original/tokenizer.model
291+
path: ${checkpoint_dir}/original/tokenizer.model
290292
prompt_template: null
291293
292294
model:
@@ -295,7 +297,7 @@ Let's modify ``custom_generation_config.yaml`` to include the following changes.
295297
296298
checkpointer:
297299
_component_: torchtune.training.FullModelHFCheckpointer
298-
checkpoint_dir: ${output_dir}
300+
checkpoint_dir: ${checkpoint_dir}
299301
checkpoint_files: [
300302
ft-model-00001-of-00002.safetensors,
301303
ft-model-00002-of-00002.safetensors,
@@ -312,8 +314,8 @@ Let's modify ``custom_generation_config.yaml`` to include the following changes.
312314
313315
# Generation arguments; defaults taken from gpt-fast
314316
prompt:
315-
system: null
316-
user: "Tell me a joke. "
317+
system: null
318+
user: "Tell me a joke. "
317319
max_new_tokens: 300
318320
temperature: 0.6 # 0.8 and 0.6 are popular values to try
319321
top_k: 300
@@ -330,7 +332,7 @@ these parameters.
330332

331333
.. code-block:: text
332334
333-
$ tune run generate --config ./custom_generation_config.yaml prompt="tell me a joke. "
335+
$ tune run generate --config ./custom_generation_config.yaml prompt.user="Tell me a joke. "
334336
Tell me a joke. Here's a joke for you:
335337
336338
What do you call a fake noodle?

recipes/configs/generation.yaml

+6-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
1-
# Config for running the InferenceRecipe in generate.py to generate output from an LLM
1+
# Config for running the InferenceRecipe in generate.py to generate output
2+
# from Llama2 7B model
3+
#
4+
# This config assumes that you've run the following command before launching
5+
# this run:
6+
# tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf --ignore-patterns "*.safetensors" --hf-token <HF_TOKEN>
27
#
38
# To launch, run the following command from root torchtune directory:
49
# tune run generate --config generation
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Config for running the InferenceRecipe in dev/generate_v2.py to generate output
2+
# using a Llama3 70B Instruct model
3+
#
4+
# This config assumes that you've run the following command before launching:
5+
# tune download meta-llama/Meta-Llama-3-70B-Instruct --output-dir /tmp/Meta-Llama-3-70B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
6+
#
7+
# To launch, run the following command from root torchtune directory:
8+
# tune run --nproc_per_node 8 dev/generate_v2_distributed --config llama3/70B_generation_distributed
9+
10+
output_dir: ./
11+
12+
# Model arguments
13+
model:
14+
_component_: torchtune.models.llama3.llama3_70b
15+
16+
parallelize_plan:
17+
_component_: torchtune.models.llama3.base_llama_tp_plan
18+
19+
# Transform arguments
20+
tokenizer:
21+
_component_: torchtune.models.llama3.llama3_tokenizer
22+
path: /tmp/Meta-Llama-3-70B-Instruct/original/tokenizer.model
23+
prompt_template: null
24+
max_seq_len: 8192
25+
26+
# Checkpointer
27+
checkpointer:
28+
_component_: torchtune.training.FullModelHFCheckpointer
29+
checkpoint_dir: /tmp/Meta-Llama-3-70B-Instruct
30+
checkpoint_files:
31+
filename_format: model-{}-of-{}.safetensors
32+
max_filename: "00030"
33+
recipe_checkpoint: null
34+
output_dir: ${output_dir}
35+
model_type: LLAMA3
36+
37+
# Device
38+
device: cuda
39+
dtype: bf16
40+
seed: 1234
41+
log_level: INFO
42+
43+
# Generation arguments
44+
prompt:
45+
system: null
46+
user:
47+
text: Tell a joke.
48+
max_new_tokens: 200
49+
temperature: 0.6 # 0.8 and 0.6 are popular values to try
50+
top_k: 300
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Config for running the InferenceRecipe in dev/generate_v2.py to generate output
2+
# using a Llama3.1 70B Instruct model
3+
#
4+
# This config assumes that you've run the following command before launching:
5+
# tune download meta-llama/Meta-Llama-3.1-70B-Instruct --output-dir /tmp/Meta-Llama-3.1-70B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
6+
#
7+
# To launch, run the following command from root torchtune directory:
8+
# tune run --nproc_per_node 8 dev/generate_v2_distributed --config llama3_1/70B_generation_distributed
9+
10+
output_dir: ./
11+
12+
# Model arguments
13+
model:
14+
_component_: torchtune.models.llama3_1.llama3_1_70b
15+
16+
parallelize_plan:
17+
_component_: torchtune.models.llama3.base_llama_tp_plan
18+
19+
# Transform arguments
20+
tokenizer:
21+
_component_: torchtune.models.llama3.llama3_tokenizer
22+
path: /tmp/Meta-Llama-3.1-70B-Instruct/original/tokenizer.model
23+
prompt_template: null
24+
max_seq_len: 8192
25+
26+
# Checkpointer
27+
checkpointer:
28+
_component_: torchtune.training.FullModelHFCheckpointer
29+
checkpoint_dir: /tmp/Meta-Llama-3.1-70B-Instruct/
30+
checkpoint_files:
31+
filename_format: model-{}-of-{}.safetensors
32+
max_filename: "00030"
33+
recipe_checkpoint: null
34+
output_dir: ${output_dir}
35+
model_type: LLAMA3
36+
37+
# Device
38+
device: cuda
39+
dtype: bf16
40+
seed: 1234
41+
log_level: INFO
42+
43+
# Generation arguments
44+
prompt:
45+
system: null
46+
user:
47+
text: Tell a joke.
48+
max_new_tokens: 200
49+
temperature: 0.6 # 0.8 and 0.6 are popular values to try
50+
top_k: 300

recipes/configs/llama3_2_vision/11B_generation_v2.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
# To launch, run the following command from root torchtune directory:
88
# tune run dev/generate_v2 --config llama3_2_vision/generation_v2
99

10-
output_dir: ./ # Not needed
10+
output_dir: ./
1111

1212
# Model arguments
1313
model:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Config for running the InferenceRecipe in dev/generate_v2.py to generate output
2+
# using a Llama3.1 70B Instruct model
3+
#
4+
# This config assumes that you've run the following command before launching:
5+
# tune download meta-llama/Llama-3.3-70B-Instruct --ignore-patterns "original/consolidated*" --hf-token <HF_TOKEN>
6+
#
7+
# To launch, run the following command from root torchtune directory:
8+
# tune run --nproc_per_node 8 dev/generate_v2_distributed --config llama3_3/70B_generation_distributed
9+
10+
output_dir: ./
11+
12+
# Model arguments
13+
model:
14+
_component_: torchtune.models.llama3_3.llama3_3_70b
15+
16+
parallelize_plan:
17+
_component_: torchtune.models.llama3.base_llama_tp_plan
18+
19+
# Transform arguments
20+
tokenizer:
21+
_component_: torchtune.models.llama3.llama3_tokenizer
22+
path: /tmp/Llama-3.3-70B-Instruct/original/tokenizer.model
23+
prompt_template: null
24+
max_seq_len: 8192
25+
26+
# Checkpointer
27+
checkpointer:
28+
_component_: torchtune.training.FullModelHFCheckpointer
29+
checkpoint_dir: /tmp/Llama-3.3-70B-Instruct/
30+
checkpoint_files:
31+
filename_format: model-{}-of-{}.safetensors
32+
max_filename: "00030"
33+
recipe_checkpoint: null
34+
output_dir: ${output_dir}
35+
model_type: LLAMA3
36+
37+
# Device
38+
device: cuda
39+
dtype: bf16
40+
seed: 1234
41+
log_level: INFO
42+
43+
# Generation arguments
44+
prompt:
45+
system: null
46+
user:
47+
text: Tell a joke.
48+
max_new_tokens: 200
49+
temperature: 0.6 # 0.8 and 0.6 are popular values to try
50+
top_k: 300

recipes/dev/early_exit_finetune_distributed.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -653,7 +653,7 @@ def _setup_data(
653653
for single_cfg_dataset in cfg_dataset
654654
]
655655
ds = ConcatDataset(datasets=datasets)
656-
packed = False
656+
packed = getattr(ds, "packed", False)
657657
else:
658658
ds = config.instantiate(cfg_dataset, self._tokenizer)
659659
packed = cfg_dataset.get("packed", False)
@@ -870,6 +870,7 @@ def train(self) -> None:
870870
and curr_epoch == 0
871871
and self.profiler_profile_memory
872872
and idx == self.profiler_wait_steps + self.profiler_warmup_steps
873+
and self._device.type == "cuda"
873874
):
874875
torch.cuda.memory._record_memory_history()
875876

@@ -1019,6 +1020,7 @@ def train(self) -> None:
10191020
== self.profiler_wait_steps
10201021
+ self.profiler_warmup_steps
10211022
+ self.profiler_active_steps
1023+
and self._device.type == "cuda"
10221024
):
10231025
torch.cuda.memory._record_memory_history(enabled=None)
10241026

recipes/dev/generate_v2.py

+11-7
Original file line numberDiff line numberDiff line change
@@ -39,18 +39,22 @@ def __call__(self, prompt: Dict[str, Any]) -> List[Message]:
3939

4040
# Iterate through roles and add content
4141
for role, content in prompt.items():
42-
if isinstance(content, str):
42+
if content is None:
43+
continue
44+
elif isinstance(content, str):
4345
new_content = [{"type": "text", "content": content}]
44-
else:
45-
assert (
46-
"image" in content.keys()
47-
), "Multiple entries per role expect an image key"
46+
elif "image" in content.keys():
4847
image_loc = content["image"]
4948
image = load_image(image_loc)
5049
new_content = [
5150
{"type": "image", "content": image},
5251
{"type": "text", "content": content["text"]},
5352
]
53+
else:
54+
assert (
55+
"text" in content.keys()
56+
), "Multiple entries per role expect at least a text key"
57+
new_content = [{"type": "text", "content": content["text"]}]
5458
messages.append(Message(role=role, content=new_content))
5559

5660
# Finally, add an empty assistant message to kick-start generation
@@ -109,12 +113,12 @@ def log_metrics(self, total_time: int, tokens_per_second: float) -> None:
109113
f"Time for inference: {total_time:.02f} sec total, {tokens_per_second:.02f} tokens/sec"
110114
)
111115
self._logger.info(
112-
f"Bandwidth achieved: {model_size * tokens_per_second / 1e9:.02f} GB/s"
116+
f"Bandwidth achieved: {model_size * tokens_per_second / (1024**3):.02f} GiB/s"
113117
)
114118
if self._device.type != "cpu":
115119
torch_device = utils.get_torch_device_namespace()
116120
self._logger.info(
117-
f"Max memory allocated: {torch_device.max_memory_allocated() / 1e9:.02f} GB"
121+
f"Max memory allocated: {torch_device.max_memory_allocated() / (1024**3):.02f} GiB"
118122
)
119123

120124
@torch.inference_mode()

0 commit comments

Comments
 (0)