You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/extraction/content-metadata.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ These fields apply to all content types including text, images, and tables.
43
43
| Subtype | The type of the content for structured data types, such as table or chart. | — |
44
44
| Content | Content extracted from the source. | Extracted |
45
45
| Description | A text description of the content object. | Generated |
46
-
| Page \#| The page \# of the content in the source. Prior to 26.1.2, this field was 0-indexed. Beginning with 26.1.2, this field is 1-indexed. | Extracted |
46
+
| Page \#| The page \# of the content in the source. Prior to 26.3.0-RC1, this field was 0-indexed. Beginning with 26.3.0-RC1, this field is 1-indexed. | Extracted |
47
47
| Hierarchy | The location or order of the content within the source. | Extracted |
Copy file name to clipboardExpand all lines: docs/docs/extraction/releasenotes-nv-ingest.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,19 +8,19 @@ This documentation contains the release notes for [NeMo Retriever Library](overv
8
8
9
9
10
10
11
-
## Release 26.01 (26.1.2)
11
+
## Release 26.01 (26.3.0-RC1)
12
12
13
13
The NeMo Retriever Library 26.01 release adds new hardware and software support, and other improvements.
14
14
15
-
To upgrade the Helm Charts for this version, refer to [NeMo Retriever Library Helm Charts](https://github.com/NVIDIA/NeMo-Retriever/blob/release/26.1.2/helm/README.md).
15
+
To upgrade the Helm Charts for this version, refer to [NeMo Retriever Library Helm Charts](https://github.com/NVIDIA/NeMo-Retriever/blob/release/26.3.0-RC1/helm/README.md).
16
16
17
17
18
18
### Highlights
19
19
20
20
This release contains the following key changes:
21
21
22
22
- Added functional support for [H200 NVL](https://www.nvidia.com/en-us/data-center/h200/). For details, refer to [Support Matrix](support-matrix.md).
23
-
- All Helm deployments for Kubernetes now use [NVIDIA NIM Operator](https://docs.nvidia.com/nim-operator/latest/index.html). For details, refer to [NeMo Retriever Library Helm Charts](https://github.com/NVIDIA/NeMo-Retriever/blob/release/26.1.2/helm/README.md).
23
+
- All Helm deployments for Kubernetes now use [NVIDIA NIM Operator](https://docs.nvidia.com/nim-operator/latest/index.html). For details, refer to [NeMo Retriever Library Helm Charts](https://github.com/NVIDIA/NeMo-Retriever/blob/release/26.3.0-RC1/helm/README.md).
24
24
- Updated RIVA NIM to version 1.4.0. For details, refer to [Extract Speech](audio.md).
25
25
- Updated VLM NIM to [nemotron-nano-12b-v2-vl](https://build.nvidia.com/nvidia/nemotron-nano-12b-v2-vl/modelcard). For details, refer to [Extract Captions from Images](python-api-reference.md#extract-captions-from-images).
26
26
- Added VLM caption prompt customization parameters, including reasoning control. For details, refer to [Caption Images and Control Reasoning](python-api-reference.md#caption-images-and-control-reasoning).
@@ -33,7 +33,7 @@ This release contains the following key changes:
33
33
- Large PDFs are now automatically split into chunks and processed in parallel, delivering faster ingestion for long documents. For details, refer to [PDF Pre-Splitting](v2-api-guide.md).
34
34
- Issues maintaining extraction quality while processing very large files are now resolved with the V2 API. For details, refer to [V2 API Guide](v2-api-guide.md).
35
35
- Updated the embedding task to support embedding on custom content fields like the results of summarization functions. For details, refer to [Use the Python API](python-api-reference.md).
36
-
- User-defined function summarization is now using `nemotron-mini-4b-instruct` which provides significant speed improvements. For details, refer to [User-defined Functions](user-defined-functions.md) and [NeMo Retriever Library UDF Examples](https://github.com/NVIDIA/NeMo-Retriever/blob/release/26.1.2/examples/udfs/README.md).
36
+
- User-defined function summarization is now using `nemotron-mini-4b-instruct` which provides significant speed improvements. For details, refer to [User-defined Functions](user-defined-functions.md) and [NeMo Retriever Library UDF Examples](https://github.com/NVIDIA/NeMo-Retriever/blob/release/26.3.0-RC1/examples/udfs/README.md).
37
37
- In the `Ingestor.extract` method, the defaults for `extract_text` and `extract_images` are now set to `true` for consistency with `extract_tables` and `extract_charts`. For details, refer to [Use the Python API](python-api-reference.md).
38
38
- The `table-structure` profile is no longer available. The table-structure profile is now part of the default profile. For details, refer to [Profile Information](quickstart-guide.md#profile-information).
39
39
- New documentation [Why Throughput Is Dataset-Dependent](throughput-is-dataset-dependent.md).
@@ -49,8 +49,8 @@ This release contains the following key changes:
49
49
50
50
The following are the known issues that are fixed in this version:
51
51
52
-
- A10G support is restored. To use A10G hardware, use release 26.1.2 or later. For details, refer to [Support Matrix](support-matrix.md).
53
-
- L40S support is restored. To use L40S hardware, use release 26.1.2 or later. For details, refer to [Support Matrix](support-matrix.md).
52
+
- A10G support is restored. To use A10G hardware, use release 26.3.0-RC1 or later. For details, refer to [Support Matrix](support-matrix.md).
53
+
- L40S support is restored. To use L40S hardware, use release 26.3.0-RC1 or later. For details, refer to [Support Matrix](support-matrix.md).
54
54
- The page number field in the content metadata now starts at 1 instead of 0 so each page number is no longer off by one from what you would expect. For details, refer to [Content Metadata](content-metadata.md).
55
55
- Support for batches that include individual files greater than approximately 400MB is restored. This includes audio files and pdfs.
Copy file name to clipboardExpand all lines: examples/building_vdb_operator.ipynb
+7-7Lines changed: 7 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -486,7 +486,7 @@
486
486
" self.write_to_index(records)\n",
487
487
"```\n",
488
488
"\n",
489
-
"This method is called by the NV-Ingest Ingestor class during the ingestion pipeline. For more information on how operators are integrated into NV-Ingest, refer to the [interface implementation](https://github.com/NVIDIA/nv-ingest/blob/release/26.1.2/client/src/nv_ingest_client/client/interface.py#L324).\n",
489
+
"This method is called by the NV-Ingest Ingestor class during the ingestion pipeline. For more information on how operators are integrated into NV-Ingest, refer to the [interface implementation](https://github.com/NVIDIA/nv-ingest/blob/release/26.3.0-RC1/client/src/nv_ingest_client/client/interface.py#L324).\n",
490
490
"\n",
491
491
"The simplicity of this method belies its importance - it ensures that indexes are properly configured before data ingestion begins."
492
492
]
@@ -728,12 +728,12 @@
728
728
"\n",
729
729
"This implementation includes all the features covered in this tutorial:\n",
730
730
"\n",
731
-
"- ✅ Complete OpenSearch integration with k-NN vector search\n",
732
-
"- ✅ Configurable connection parameters and index settings\n",
733
-
"- ✅ Robust data validation and content filtering\n",
734
-
"- ✅ Efficient batch processing and error handling\n",
735
-
"- ✅ NVIDIA embedding model integration for query vectorization\n",
736
-
"- ✅ Optimized response formatting and payload management\n",
731
+
"- \u2705 Complete OpenSearch integration with k-NN vector search\n",
732
+
"- \u2705 Configurable connection parameters and index settings\n",
733
+
"- \u2705 Robust data validation and content filtering\n",
734
+
"- \u2705 Efficient batch processing and error handling\n",
735
+
"- \u2705 NVIDIA embedding model integration for query vectorization\n",
736
+
"- \u2705 Optimized response formatting and payload management\n",
737
737
"\n",
738
738
"### Getting Started with the OpenSearch Operator\n",
0 commit comments