NVIDIA · kheiss-uwzoo · Feb 5, 2026 · Feb 5, 2026 · Feb 5, 2026 · Feb 5, 2026
@@ -46,15 +46,11 @@ If you want chunks smaller than `page`, use token-based splitting as described i
 
 The `split` task uses a tokenizer to count the number of tokens in the document, 
 and splits the document based on the desired maximum chunk size and chunk overlap. 
-We recommend that you use the `meta-llama/Llama-3.2-1B` tokenizer, 
-because it's the same tokenizer as the llama-3.2 embedding model that we use for embedding.
-However, you can use any tokenizer from any HuggingFace model that includes a tokenizer file.
 
-Use the `split` method to chunk large documents as shown in the following code.
-
-!!! note
+We recommend the default tokenizer for token-based splitting. For more information, refer to [Llama tokenizer (default)](#llama-tokenizer-default).
+You can also use any tokenizer from any HuggingFace model that includes a tokenizer file.
 
-    The default tokenizer (`meta-llama/Llama-3.2-1B`) requires a [Hugging Face access token](https://huggingface.co/docs/hub/en/security-tokens). You must set `hf_access_token": "hf_***` to authenticate.
+Use the `split` method to chunk large documents as shown in the following code.
 
 ```python
 ingestor = ingestor.split(
@@ -76,6 +72,23 @@ ingestor = ingestor.split(
 )
 ```
 
+### Llama tokenizer (default) {#llama-tokenizer-default}
+
+The default tokenizer for token-based splitting is **`meta-llama/Llama-3.2-1B`**. It matches the tokenizer used by the Llama 3.2 embedding model, which helps keep chunk boundaries aligned with the embedding model.
+
+!!! note
+
+    This tokenizer is gated on Hugging Face and requires an access token. For more information, refer to [User access tokens](https://huggingface.co/docs/hub/en/security-tokens). You must set `hf_access_token` in your `split` params (for example, `"hf_***"`) to authenticate.
+
+By default, the NV Ingest container includes this tokenizer pre-downloaded at build time, so it does not need to be fetched at runtime. If you build the container yourself and want to pre-download it, do the following:
+
+- Review the [license agreement](https://huggingface.co/meta-llama/Llama-3.2-1B).
+- [Request access](https://huggingface.co/meta-llama/Llama-3.2-1B).
+- Set the `DOWNLOAD_LLAMA_TOKENIZER` environment variable to `True`.
+- Set the `HF_ACCESS_TOKEN` environment variable to your HuggingFace access token.
+
+For details on how to set environment variables, refer to [Environment Variables](environment-config.md).
+
 ### Split Parameters
 
 The following table contains the `split` parameters.
@@ -91,19 +104,6 @@ The following table contains the `split` parameters.
 
 
 
-### Pre-download the Tokenizer
-
-By default, the NV Ingest container comes with the `meta-llama/Llama-3.2-1B` tokenizer pre-downloaded 
-so that it doesn't have to download a tokenizer at runtime.
-If you are building the container yourself and want to pre-download this model, do the following:
-
-- Review the [license agreement](https://huggingface.co/meta-llama/Llama-3.2-1B).
-- [Request access](https://huggingface.co/meta-llama/Llama-3.2-1B).
-- Set the `DOWNLOAD_LLAMA_TOKENIZER` environment variable to `True`
-- Set the `HF_ACCESS_TOKEN` environment variable to your HuggingFace access token.
-
-
-
 ## Related Topics
 
 - [Use the Python API](nv-ingest-python-api.md)

@@ -12,8 +12,8 @@ You can specify these in your .env file or directly in your environment.
 
 | Name                             | Example                        | Description                                                           |
 |----------------------------------|--------------------------------|-----------------------------------------------------------------------|
-| `DOWNLOAD_LLAMA_TOKENIZER`       | -                                                        | The Llama tokenizer is now pre-downloaded at build time. For details, refer to [Token-Based Splitting](chunking.md#token-based-splitting). |
-| `HF_ACCESS_TOKEN`                | -                                                         | A token to access HuggingFace models. For details, refer to [Token-Based Splitting](chunking.md#token-based-splitting). |
+| `DOWNLOAD_LLAMA_TOKENIZER`       | -                                                        | Pre-download the default tokenizer at build time. For details, refer to [Llama tokenizer](chunking.md#llama-tokenizer-default). |
+| `HF_ACCESS_TOKEN`                | -                                                         | A token to access HuggingFace models. For details, refer to [Llama tokenizer](chunking.md#llama-tokenizer-default). |
 | `INGEST_LOG_LEVEL`               | - `DEBUG` <br/> - `INFO` <br/> - `WARNING` <br/> - `ERROR` <br/> - `CRITICAL` <br/> | The log level for the ingest service, which controls the verbosity of the logging output. |
 | `MESSAGE_CLIENT_HOST`            | - `redis` <br/> - `localhost` <br/> - `192.168.1.10` <br/> | Specifies the hostname or IP address of the message broker used for communication between services. |
 | `MESSAGE_CLIENT_PORT`            | - `7670` <br/> - `6379` <br/>                              | Specifies the port number on which the message broker is listening. |

@@ -534,6 +534,7 @@ For more information on environment variables, refer to [Environment Variables](
 ## Extract Audio
 
 Use the following code to extract mp3 audio content.
+The example uses the default tokenizer for token-based splitting; see [Llama tokenizer (default)](chunking.md#llama-tokenizer-default).
 
 ```python
 from nv_ingest_client.client import Ingestor

@@ -26,7 +26,7 @@ This release contains the following key changes:
 - Added VLM caption prompt customization parameters, including reasoning control. For details, refer to [Caption Images and Control Reasoning](nv-ingest-python-api.md#caption-images-and-control-reasoning).
 - Added support for the [nemotron-parse](https://build.nvidia.com/nvidia/nemotron-parse/modelcard) model which replaces the [nemoretriever-parse](https://build.nvidia.com/nvidia/nemoretriever-parse/modelcard) model. For details, refer to [Advanced Visual Parsing](nemoretriever-parse.md).
 - Support is now deprecated for [paddleocr](https://build.nvidia.com/baidu/paddleocr/modelcard).
-- The `meta-llama/Llama-3.2-1B` tokenizer is now pre-downloaded so that you can run token-based splitting without making a network request. For details, refer to [Split Documents](chunking.md).
+- The default tokenizer for token-based splitting is now pre-downloaded at build time so you can run splitting without a network request. For details, refer to [Llama tokenizer (default](chunking.md#llama-tokenizer-default).
 - For scanned PDFs, added specialized extraction strategies. For details, refer to [PDF Extraction Strategies](nv-ingest-python-api.md#pdf-extraction-strategies).
 - Added support for [LanceDB](https://lancedb.com/). For details, refer to [Upload to a Custom Data Store](data-store.md).
 - The V2 API is now available and is the default processing pipeline. The response format remains backwards-compatible. You can enable the v2 API by using `message_client_kwargs={"api_version": "v2"}`.For details, refer to [API Reference](api-docs).