Skip to content

Commit b5b3f79

Browse files
committed
Merge remote-tracking branch 'upstream/main' into insop/kld
2 parents d5576e2 + be4ff50 commit b5b3f79

File tree

258 files changed

+2197
-1270
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

258 files changed

+2197
-1270
lines changed

.github/workflows/gpu_test.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ jobs:
4646
run: python -m pip install --upgrade pip
4747
- name: Install torch nightly
4848
if: ${{ matrix.torch-version == 'nightly' }}
49-
run: python -m pip install --pre torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu121
49+
run: python -m pip install --pre torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu126
5050
- name: Install torch stable
5151
if: ${{ matrix.torch-version == 'stable' }}
5252
run: python -m pip install torch torchvision torchao

.pre-commit-config.yaml

+5-5
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ default_language_version:
55

66
repos:
77
- repo: https://github.com/pre-commit/pre-commit-hooks
8-
rev: 6306a48f7dae5861702d573c9c247e4e9498e867
8+
rev: v5.0.0
99
hooks:
1010
- id: trailing-whitespace
1111
- id: check-ast
@@ -18,7 +18,7 @@ repos:
1818
exclude: '^(.*\.svg)$'
1919

2020
- repo: https://github.com/Lucas-C/pre-commit-hooks
21-
rev: v1.5.4
21+
rev: v1.5.5
2222
hooks:
2323
- id: insert-license
2424
files: \.py$|\.sh$
@@ -27,7 +27,7 @@ repos:
2727
- docs/license_header.txt
2828

2929
- repo: https://github.com/pycqa/flake8
30-
rev: 34cbf8ef3950f43d09b85e2e45c15ae5717dc37b
30+
rev: 7.1.1
3131
hooks:
3232
- id: flake8
3333
additional_dependencies:
@@ -37,15 +37,15 @@ repos:
3737
args: ['--config=.flake8']
3838

3939
- repo: https://github.com/omnilib/ufmt
40-
rev: v2.3.0
40+
rev: v2.8.0
4141
hooks:
4242
- id: ufmt
4343
additional_dependencies:
4444
- black == 22.12.0
4545
- usort == 1.0.5
4646

4747
- repo: https://github.com/jsh9/pydoclint
48-
rev: 94efc5f989adbea30f3534b476b2931a02c1af90
48+
rev: 0.5.12
4949
hooks:
5050
- id: pydoclint
5151
args: [--config=pyproject.toml]

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ pip install torchtune
170170

171171
```bash
172172
# Install PyTorch, torchvision, torchao nightlies
173-
pip install --pre --upgrade torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu121 # full options are cpu/cu118/cu121/cu124
173+
pip install --pre --upgrade torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu126 # full options are cpu/cu118/cu121/cu124/cu126
174174
pip install --pre --upgrade torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu
175175
```
176176

docs/source/api_ref_modules.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,10 @@ model specific tokenizers.
4848
:toctree: generated/
4949
:nosignatures:
5050

51-
tokenizers.SentencePieceBaseTokenizer
52-
tokenizers.TikTokenBaseTokenizer
53-
tokenizers.ModelTokenizer
54-
tokenizers.BaseTokenizer
51+
transforms.tokenizers.SentencePieceBaseTokenizer
52+
transforms.tokenizers.TikTokenBaseTokenizer
53+
transforms.tokenizers.ModelTokenizer
54+
transforms.tokenizers.BaseTokenizer
5555

5656
Tokenizer Utilities
5757
-------------------
@@ -61,8 +61,8 @@ These are helper methods that can be used by any tokenizer.
6161
:toctree: generated/
6262
:nosignatures:
6363

64-
tokenizers.tokenize_messages_no_special_tokens
65-
tokenizers.parse_hf_tokenizer_json
64+
transforms.tokenizers.tokenize_messages_no_special_tokens
65+
transforms.tokenizers.parse_hf_tokenizer_json
6666

6767

6868
PEFT Components

docs/source/api_ref_rlhf.rst

-1
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,3 @@ Components and losses for RLHF algorithms like PPO and DPO.
1616
loss.PPOLoss
1717
loss.DPOLoss
1818
loss.RSOLoss
19-
loss.SimPOLoss

docs/source/basics/custom_components.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ our models in torchtune - see :func:`~torchtune.models.llama3_2_vision.llama3_2_
117117
#
118118
from torchtune.datasets import SFTDataset, PackedDataset
119119
from torchtune.data import InputOutputToMessages
120-
from torchtune.modules.tokenizers import ModelTokenizer
120+
from torchtune.modules.transforms.tokenizers import ModelTokenizer
121121
122122
# Example builder function for a custom code instruct dataset not in torchtune, but using
123123
# different dataset building blocks from torchtune

docs/source/basics/message_transforms.rst

+1
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ Example message transforms
9595
--------------------------
9696
- Instruct
9797
- :class:`~torchtune.data.InputOutputToMessages`
98+
- :class:`~torchtune.data.AlpacaToMessages`
9899
- Chat
99100
- :class:`~torchtune.data.ShareGPTToMessages`
100101
- :class:`~torchtune.data.OpenAIToMessages`

docs/source/basics/model_transforms.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ The following methods are required on the model transform:
101101

102102
.. code-block:: python
103103
104-
from torchtune.modules.tokenizers import ModelTokenizer
104+
from torchtune.modules.transforms.tokenizers import ModelTokenizer
105105
from torchtune.modules.transforms import Transform
106106
107107
class MyMultimodalTransform(ModelTokenizer, Transform):

docs/source/basics/tokenizers.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ For example, here we change the ``"<|begin_of_text|>"`` and ``"<|end_of_text|>"`
168168
Base tokenizers
169169
---------------
170170

171-
:class:`~torchtune.modules.tokenizers.BaseTokenizer` are the underlying byte-pair encoding modules that perform the actual raw string to token ID conversion and back.
171+
:class:`~torchtune.modules.transforms.tokenizers.BaseTokenizer` are the underlying byte-pair encoding modules that perform the actual raw string to token ID conversion and back.
172172
In torchtune, they are required to implement ``encode`` and ``decode`` methods, which are called by the :ref:`model_tokenizers` to convert
173173
between raw text and token IDs.
174174

@@ -202,13 +202,13 @@ between raw text and token IDs.
202202
"""
203203
pass
204204
205-
If you load any :ref:`model_tokenizers`, you can see that it calls its underlying :class:`~torchtune.modules.tokenizers.BaseTokenizer`
205+
If you load any :ref:`model_tokenizers`, you can see that it calls its underlying :class:`~torchtune.modules.transforms.tokenizers.BaseTokenizer`
206206
to do the actual encoding and decoding.
207207

208208
.. code-block:: python
209209
210210
from torchtune.models.mistral import mistral_tokenizer
211-
from torchtune.modules.tokenizers import SentencePieceBaseTokenizer
211+
from torchtune.modules.transforms.tokenizers import SentencePieceBaseTokenizer
212212
213213
m_tokenizer = mistral_tokenizer("/tmp/Mistral-7B-v0.1/tokenizer.model")
214214
# Mistral uses SentencePiece for its underlying BPE
@@ -227,7 +227,7 @@ to do the actual encoding and decoding.
227227
Model tokenizers
228228
----------------
229229

230-
:class:`~torchtune.modules.tokenizers.ModelTokenizer` are specific to a particular model. They are required to implement the ``tokenize_messages`` method,
230+
:class:`~torchtune.modules.transforms.tokenizers.ModelTokenizer` are specific to a particular model. They are required to implement the ``tokenize_messages`` method,
231231
which converts a list of Messages into a list of token IDs.
232232

233233
.. code-block:: python
@@ -259,7 +259,7 @@ is because they add all the necessary special tokens or prompt templates require
259259
.. code-block:: python
260260
261261
from torchtune.models.mistral import mistral_tokenizer
262-
from torchtune.modules.tokenizers import SentencePieceBaseTokenizer
262+
from torchtune.modules.transforms.tokenizers import SentencePieceBaseTokenizer
263263
from torchtune.data import Message
264264
265265
m_tokenizer = mistral_tokenizer("/tmp/Mistral-7B-v0.1/tokenizer.model")

docs/source/install.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ nightly versions with the following commands:
1919
pip install torch torchvision torchao
2020
2121
# Or nightly install for latest features
22-
pip install --pre torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu121 # full options are cpu/cu118/cu121/cu124
22+
pip install --pre torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu126 # full options are cpu/cu118/cu121/cu124/cu126
2323
2424
2525
Install via PyPI
@@ -88,4 +88,4 @@ to the package *without* installing via ``git clone``, you can install with the
8888
If you already have PyTorch installed, torchtune will default to using that version. However, if you want to
8989
use the nightly version of PyTorch, you can append the ``--force-reinstall`` option to the above command. If you
9090
opt for this install method, you will likely need to change the "cpu" suffix in the index url to match your CUDA
91-
version. For example, if you are running CUDA 12, your index url would be "https://download.pytorch.org/whl/nightly/cu121".
91+
version. For example, if you are running CUDA 12, your index url would be "https://download.pytorch.org/whl/nightly/cu126".

docs/source/recipes/dpo.rst

-2
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,6 @@ To use any of these, simply use the ``loss`` config entry or flag through the :r
5656
loss=torchtune.modules.loss.RSOLoss \
5757
gamma=0.5
5858
59-
.. todo (@SalmanMohammadi) point to an example repo for SimPO
60-
6159
For a deeper understanding of the different levers you can pull when using this recipe,
6260
see our documentation for the different PEFT training paradigms we support:
6361

docs/source/tutorials/e2e_flow.rst

+8-6
Original file line numberDiff line numberDiff line change
@@ -275,18 +275,20 @@ Let's first copy over the config to our local working directory so we can make c
275275
276276
$ tune cp generation ./custom_generation_config.yaml
277277
Copied file to custom_generation_config.yaml
278+
$ mkdir /tmp/torchtune/llama3_2_3B/lora_single_device/out
278279
279280
Let's modify ``custom_generation_config.yaml`` to include the following changes. Again, you only need
280281
to replace two fields: ``output_dir`` and ``checkpoint_files``
281282

282283
.. code-block:: yaml
283284
284-
output_dir: /tmp/torchtune/llama3_2_3B/lora_single_device/epoch_0
285+
checkpoint_dir: /tmp/torchtune/llama3_2_3B/lora_single_device/epoch_0
286+
output_dir: /tmp/torchtune/llama3_2_3B/lora_single_device/out
285287
286288
# Tokenizer
287289
tokenizer:
288290
_component_: torchtune.models.llama3.llama3_tokenizer
289-
path: ${output_dir}/original/tokenizer.model
291+
path: ${checkpoint_dir}/original/tokenizer.model
290292
prompt_template: null
291293
292294
model:
@@ -295,7 +297,7 @@ Let's modify ``custom_generation_config.yaml`` to include the following changes.
295297
296298
checkpointer:
297299
_component_: torchtune.training.FullModelHFCheckpointer
298-
checkpoint_dir: ${output_dir}
300+
checkpoint_dir: ${checkpoint_dir}
299301
checkpoint_files: [
300302
ft-model-00001-of-00002.safetensors,
301303
ft-model-00002-of-00002.safetensors,
@@ -312,8 +314,8 @@ Let's modify ``custom_generation_config.yaml`` to include the following changes.
312314
313315
# Generation arguments; defaults taken from gpt-fast
314316
prompt:
315-
system: null
316-
user: "Tell me a joke. "
317+
system: null
318+
user: "Tell me a joke. "
317319
max_new_tokens: 300
318320
temperature: 0.6 # 0.8 and 0.6 are popular values to try
319321
top_k: 300
@@ -330,7 +332,7 @@ these parameters.
330332

331333
.. code-block:: text
332334
333-
$ tune run generate --config ./custom_generation_config.yaml prompt="tell me a joke. "
335+
$ tune run generate --config ./custom_generation_config.yaml prompt.user="Tell me a joke. "
334336
Tell me a joke. Here's a joke for you:
335337
336338
What do you call a fake noodle?

docs/source/tutorials/llama3.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ Running generation with our LoRA-finetuned model, we see the following output:
230230
.. code-block:: bash
231231
232232
tune run generate --config ./custom_generation_config.yaml \
233-
prompt="Hello, my name is"
233+
prompt.user="Hello, my name is"
234234
235235
[generate.py:122] Hello, my name is Sarah and I am a busy working mum of two young children, living in the North East of England.
236236
...

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ target-version = ["py38"]
8787
[tool.pydoclint]
8888
style = 'google'
8989
check-return-types = 'False'
90-
exclude = 'tests/torchtune/models/(\w+)/scripts/'
90+
exclude = 'tests/torchtune/models/(\w+)/scripts/|recipes/|torchtune/modules/_export'
9191

9292
[tool.pytest.ini_options]
9393
addopts = ["--showlocals", "--import-mode=prepend", "--without-integration", "--without-slow-integration"]

recipes/configs/code_llama2/7B_full_low_memory.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ optimizer:
6464
optimizer_in_bwd: True # True saves memory. Requires gradient_accumulation_steps=1
6565
loss:
6666
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
67+
clip_grad_norm: null
6768
compile: False # torch.compile the model + loss, True increases speed + decreases memory
6869

6970
# Training env

recipes/configs/code_llama2/7B_lora_single_device.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ lr_scheduler:
7272
num_warmup_steps: 100
7373
loss:
7474
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
75+
clip_grad_norm: null
7576
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7677

7778
# Training env

recipes/configs/code_llama2/7B_qlora_single_device.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ lr_scheduler:
7171
num_warmup_steps: 100
7272
loss:
7373
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
74+
clip_grad_norm: null
7475
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7576

7677
# Training env

recipes/configs/code_llama2/evaluation.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
# To launch, run the following command:
44
# tune run eleuther_eval --config code_llama2/evaluation
55

6+
output_dir: ./ # Not needed
7+
68
# Model arguments
79
model:
810
_component_: torchtune.models.code_llama2.code_llama2_7b

recipes/configs/gemma/2B_full.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ loss:
5757
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
5858
max_steps_per_epoch: null
5959
gradient_accumulation_steps: 1 # Use to increase effective batch size
60+
clip_grad_norm: null
6061
compile: False # torch.compile the model + loss, True increases speed + decreases memory
6162
optimizer_in_bwd: False # True saves memory. Requires gradient_accumulation_steps=1
6263

recipes/configs/gemma/2B_lora.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ batch_size: 4
6969
epochs: 1
7070
max_steps_per_epoch: null
7171
gradient_accumulation_steps: 1 # Use to increase effective batch size
72+
clip_grad_norm: null
7273
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7374

7475
# Training env

recipes/configs/gemma/2B_lora_single_device.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ batch_size: 4
6868
epochs: 1
6969
max_steps_per_epoch: null
7070
gradient_accumulation_steps: 8 # Use to increase effective batch size
71+
clip_grad_norm: null
7172
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7273

7374
# Training env

recipes/configs/gemma/2B_qlora_single_device.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ batch_size: 4
6868
epochs: 1
6969
max_steps_per_epoch: null
7070
gradient_accumulation_steps: 8 # Use to increase effective batch size
71+
clip_grad_norm: null
7172
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7273

7374
# Training env

recipes/configs/gemma/7B_full.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ loss:
5959
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
6060
max_steps_per_epoch: null
6161
gradient_accumulation_steps: 1 # Use to increase effective batch size
62+
clip_grad_norm: null
6263
compile: False # torch.compile the model + loss, True increases speed + decreases memory
6364
optimizer_in_bwd: False # True saves memory. Requires gradient_accumulation_steps=1
6465

recipes/configs/gemma/7B_lora.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ batch_size: 4
7171
epochs: 1
7272
max_steps_per_epoch: null
7373
gradient_accumulation_steps: 1 # Use to increase effective batch size
74+
clip_grad_norm: null
7475
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7576

7677
# Training env

recipes/configs/gemma/7B_lora_single_device.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ batch_size: 8
7070
epochs: 1
7171
max_steps_per_epoch: null
7272
gradient_accumulation_steps: 8 # Use to increase effective batch size
73+
clip_grad_norm: null
7374
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7475

7576
# Training env

recipes/configs/gemma/7B_qlora_single_device.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ batch_size: 4
7070
epochs: 1
7171
max_steps_per_epoch: null
7272
gradient_accumulation_steps: 8 # Use to increase effective batch size
73+
clip_grad_norm: null
7374
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7475

7576
# Training env

recipes/configs/gemma2/27B_full.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ loss:
5656
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
5757
max_steps_per_epoch: null
5858
gradient_accumulation_steps: 1 # Use to increase effective batch size
59+
clip_grad_norm: null
5960
compile: False # torch.compile the model + loss, True increases speed + decreases memory
6061
optimizer_in_bwd: False # True saves memory. Requires gradient_accumulation_steps=1
6162

recipes/configs/gemma2/27B_lora.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ batch_size: 4
6868
epochs: 1
6969
max_steps_per_epoch: null
7070
gradient_accumulation_steps: 1 # Use to increase effective batch size
71+
clip_grad_norm: null
7172
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7273

7374
# Training env

recipes/configs/gemma2/27B_lora_single_device.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ batch_size: 2
6767
epochs: 1
6868
max_steps_per_epoch: null
6969
gradient_accumulation_steps: 8 # Use to increase effective batch size
70+
clip_grad_norm: null
7071
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7172

7273
# Training env

recipes/configs/gemma2/27B_qlora_single_device.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ batch_size: 4
6767
epochs: 1
6868
max_steps_per_epoch: null
6969
gradient_accumulation_steps: 8 # Use to increase effective batch size
70+
clip_grad_norm: null
7071
compile: False # torch.compile the model + loss, True increases speed + decreases memory
7172

7273
# Training env

recipes/configs/gemma2/2B_full.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ loss:
5858
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
5959
max_steps_per_epoch: null
6060
gradient_accumulation_steps: 1 # Use to increase effective batch size
61+
clip_grad_norm: null
6162
compile: False # torch.compile the model + loss, True increases speed + decreases memory
6263
optimizer_in_bwd: False # True saves memory. Requires gradient_accumulation_steps=1
6364

0 commit comments

Comments
 (0)