You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source/basics/tokenizers.rst
+5-5
Original file line number
Diff line number
Diff line change
@@ -168,7 +168,7 @@ For example, here we change the ``"<|begin_of_text|>"`` and ``"<|end_of_text|>"`
168
168
Base tokenizers
169
169
---------------
170
170
171
-
:class:`~torchtune.modules.tokenizers.BaseTokenizer` are the underlying byte-pair encoding modules that perform the actual raw string to token ID conversion and back.
171
+
:class:`~torchtune.modules.transforms.tokenizers.BaseTokenizer` are the underlying byte-pair encoding modules that perform the actual raw string to token ID conversion and back.
172
172
In torchtune, they are required to implement ``encode`` and ``decode`` methods, which are called by the :ref:`model_tokenizers` to convert
173
173
between raw text and token IDs.
174
174
@@ -202,13 +202,13 @@ between raw text and token IDs.
202
202
"""
203
203
pass
204
204
205
-
If you load any :ref:`model_tokenizers`, you can see that it calls its underlying :class:`~torchtune.modules.tokenizers.BaseTokenizer`
205
+
If you load any :ref:`model_tokenizers`, you can see that it calls its underlying :class:`~torchtune.modules.transforms.tokenizers.BaseTokenizer`
206
206
to do the actual encoding and decoding.
207
207
208
208
.. code-block:: python
209
209
210
210
from torchtune.models.mistral import mistral_tokenizer
211
-
from torchtune.modules.tokenizers import SentencePieceBaseTokenizer
211
+
from torchtune.modules.transforms.tokenizers import SentencePieceBaseTokenizer
# Mistral uses SentencePiece for its underlying BPE
@@ -227,7 +227,7 @@ to do the actual encoding and decoding.
227
227
Model tokenizers
228
228
----------------
229
229
230
-
:class:`~torchtune.modules.tokenizers.ModelTokenizer` are specific to a particular model. They are required to implement the ``tokenize_messages`` method,
230
+
:class:`~torchtune.modules.transforms.tokenizers.ModelTokenizer` are specific to a particular model. They are required to implement the ``tokenize_messages`` method,
231
231
which converts a list of Messages into a list of token IDs.
232
232
233
233
.. code-block:: python
@@ -259,7 +259,7 @@ is because they add all the necessary special tokens or prompt templates require
259
259
.. code-block:: python
260
260
261
261
from torchtune.models.mistral import mistral_tokenizer
262
-
from torchtune.modules.tokenizers import SentencePieceBaseTokenizer
262
+
from torchtune.modules.transforms.tokenizers import SentencePieceBaseTokenizer
0 commit comments