diff --git a/docs/docs/extraction/chunking.md b/docs/docs/extraction/chunking.md
index 0bddfa4a2..d331e3e7b 100644
--- a/docs/docs/extraction/chunking.md
+++ b/docs/docs/extraction/chunking.md
@@ -46,15 +46,11 @@ If you want chunks smaller than `page`, use token-based splitting as described i
The `split` task uses a tokenizer to count the number of tokens in the document,
and splits the document based on the desired maximum chunk size and chunk overlap.
-We recommend that you use the `meta-llama/Llama-3.2-1B` tokenizer,
-because it's the same tokenizer as the llama-3.2 embedding model that we use for embedding.
-However, you can use any tokenizer from any HuggingFace model that includes a tokenizer file.
-Use the `split` method to chunk large documents as shown in the following code.
-
-!!! note
+We recommend the default tokenizer for token-based splitting. For more information, refer to [Llama tokenizer (default)](#llama-tokenizer-default).
+You can also use any tokenizer from any HuggingFace model that includes a tokenizer file.
- The default tokenizer (`meta-llama/Llama-3.2-1B`) requires a [Hugging Face access token](https://huggingface.co/docs/hub/en/security-tokens). You must set `hf_access_token": "hf_***` to authenticate.
+Use the `split` method to chunk large documents as shown in the following code.
```python
ingestor = ingestor.split(
@@ -76,6 +72,23 @@ ingestor = ingestor.split(
)
```
+### Llama tokenizer (default) {#llama-tokenizer-default}
+
+The default tokenizer for token-based splitting is **`meta-llama/Llama-3.2-1B`**. It matches the tokenizer used by the Llama 3.2 embedding model, which helps keep chunk boundaries aligned with the embedding model.
+
+!!! note
+
+ This tokenizer is gated on Hugging Face and requires an access token. For more information, refer to [User access tokens](https://huggingface.co/docs/hub/en/security-tokens). You must set `hf_access_token` in your `split` params (for example, `"hf_***"`) to authenticate.
+
+By default, the NV Ingest container includes this tokenizer pre-downloaded at build time, so it does not need to be fetched at runtime. If you build the container yourself and want to pre-download it, do the following:
+
+- Review the [license agreement](https://huggingface.co/meta-llama/Llama-3.2-1B).
+- [Request access](https://huggingface.co/meta-llama/Llama-3.2-1B).
+- Set the `DOWNLOAD_LLAMA_TOKENIZER` environment variable to `True`.
+- Set the `HF_ACCESS_TOKEN` environment variable to your HuggingFace access token.
+
+For details on how to set environment variables, refer to [Environment Variables](environment-config.md).
+
### Split Parameters
The following table contains the `split` parameters.
@@ -91,19 +104,6 @@ The following table contains the `split` parameters.
-### Pre-download the Tokenizer
-
-By default, the NV Ingest container comes with the `meta-llama/Llama-3.2-1B` tokenizer pre-downloaded
-so that it doesn't have to download a tokenizer at runtime.
-If you are building the container yourself and want to pre-download this model, do the following:
-
-- Review the [license agreement](https://huggingface.co/meta-llama/Llama-3.2-1B).
-- [Request access](https://huggingface.co/meta-llama/Llama-3.2-1B).
-- Set the `DOWNLOAD_LLAMA_TOKENIZER` environment variable to `True`
-- Set the `HF_ACCESS_TOKEN` environment variable to your HuggingFace access token.
-
-
-
## Related Topics
- [Use the Python API](nv-ingest-python-api.md)
diff --git a/docs/docs/extraction/environment-config.md b/docs/docs/extraction/environment-config.md
index 6843d9b9c..150f6f174 100644
--- a/docs/docs/extraction/environment-config.md
+++ b/docs/docs/extraction/environment-config.md
@@ -12,8 +12,8 @@ You can specify these in your .env file or directly in your environment.
| Name | Example | Description |
|----------------------------------|--------------------------------|-----------------------------------------------------------------------|
-| `DOWNLOAD_LLAMA_TOKENIZER` | - | The Llama tokenizer is now pre-downloaded at build time. For details, refer to [Token-Based Splitting](chunking.md#token-based-splitting). |
-| `HF_ACCESS_TOKEN` | - | A token to access HuggingFace models. For details, refer to [Token-Based Splitting](chunking.md#token-based-splitting). |
+| `DOWNLOAD_LLAMA_TOKENIZER` | - | Pre-download the default tokenizer at build time. For details, refer to [Llama tokenizer](chunking.md#llama-tokenizer-default). |
+| `HF_ACCESS_TOKEN` | - | A token to access HuggingFace models. For details, refer to [Llama tokenizer](chunking.md#llama-tokenizer-default). |
| `INGEST_LOG_LEVEL` | - `DEBUG`
- `INFO`
- `WARNING`
- `ERROR`
- `CRITICAL`
| The log level for the ingest service, which controls the verbosity of the logging output. |
| `MESSAGE_CLIENT_HOST` | - `redis`
- `localhost`
- `192.168.1.10`
| Specifies the hostname or IP address of the message broker used for communication between services. |
| `MESSAGE_CLIENT_PORT` | - `7670`
- `6379`
| Specifies the port number on which the message broker is listening. |
diff --git a/docs/docs/extraction/nv-ingest-python-api.md b/docs/docs/extraction/nv-ingest-python-api.md
index 541130ef2..8a6720f85 100644
--- a/docs/docs/extraction/nv-ingest-python-api.md
+++ b/docs/docs/extraction/nv-ingest-python-api.md
@@ -534,6 +534,7 @@ For more information on environment variables, refer to [Environment Variables](
## Extract Audio
Use the following code to extract mp3 audio content.
+The example uses the default tokenizer for token-based splitting; see [Llama tokenizer (default)](chunking.md#llama-tokenizer-default).
```python
from nv_ingest_client.client import Ingestor
diff --git a/docs/docs/extraction/releasenotes-nv-ingest.md b/docs/docs/extraction/releasenotes-nv-ingest.md
index ebb57f19e..5c4a060b6 100644
--- a/docs/docs/extraction/releasenotes-nv-ingest.md
+++ b/docs/docs/extraction/releasenotes-nv-ingest.md
@@ -26,7 +26,7 @@ This release contains the following key changes:
- Added VLM caption prompt customization parameters, including reasoning control. For details, refer to [Caption Images and Control Reasoning](nv-ingest-python-api.md#caption-images-and-control-reasoning).
- Added support for the [nemotron-parse](https://build.nvidia.com/nvidia/nemotron-parse/modelcard) model which replaces the [nemoretriever-parse](https://build.nvidia.com/nvidia/nemoretriever-parse/modelcard) model. For details, refer to [Advanced Visual Parsing](nemoretriever-parse.md).
- Support is now deprecated for [paddleocr](https://build.nvidia.com/baidu/paddleocr/modelcard).
-- The `meta-llama/Llama-3.2-1B` tokenizer is now pre-downloaded so that you can run token-based splitting without making a network request. For details, refer to [Split Documents](chunking.md).
+- The default tokenizer for token-based splitting is now pre-downloaded at build time so you can run splitting without a network request. For details, refer to [Llama tokenizer (default](chunking.md#llama-tokenizer-default).
- For scanned PDFs, added specialized extraction strategies. For details, refer to [PDF Extraction Strategies](nv-ingest-python-api.md#pdf-extraction-strategies).
- Added support for [LanceDB](https://lancedb.com/). For details, refer to [Upload to a Custom Data Store](data-store.md).
- The V2 API is now available and is the default processing pipeline. The response format remains backwards-compatible. You can enable the v2 API by using `message_client_kwargs={"api_version": "v2"}`.For details, refer to [API Reference](api-docs).