NeMo-Retriever/helm/README.md.gotmpl at a520c5df41ba0e0b90153bf5c3f0f299f3c8f4d0 · NVIDIA/NeMo-Retriever · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
# NV-Ingest Helm Charts

This documentation contains documentation for the NV-Ingest Helm charts.

> [!Note]
> NV-Ingest is also known as NVIDIA Ingest and NeMo Retriever extraction.

## Prerequisites

Before you install the Helm charts, be sure you meet the hardware and software prerequisites. Refer to the [supported configurations](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#hardware).

The [Nvidia nim-operator](https://docs.nvidia.com/nim-operator/latest/install.html) must also be installed and configured in your cluster to ensure that
the Nvidia NIMs are properly deployed.


## Initial Environment Setup

1. Create your namespace by running the following code.

```bash
NAMESPACE=nv-ingest
kubectl create namespace ${NAMESPACE}
```

2. Add the Helm repos by running the following code.

```bash
# NVIDIA nemo-microservices NGC repository
helm repo add nemo-microservices https://helm.ngc.nvidia.com/nvidia/nemo-microservices --username='$oauthtoken' --password=<NGC_API_KEY>

# NVIDIA NIM NGC repository
helm repo add nvidia-nim https://helm.ngc.nvidia.com/nim/nvidia --username='$oauthtoken' --password=<NGC_API_KEY>

# NVIDIA NIM Base repository
helm repo add nim https://helm.ngc.nvidia.com/nim --username='$oauthtoken' --password=<NGC_API_KEY>

# NVIDIA NIM baidu NGC repository
helm repo add baidu-nim https://helm.ngc.nvidia.com/nim/baidu --username='$oauthtoken' --password=<YOUR API KEY>
```

## Install or Upgrade the Helm Chart

To install or upgrade the Helm chart, run the following code.

```bash
helm upgrade \
    --install \
    nv-ingest \
    https://helm.ngc.nvidia.com/nvidia/nemo-microservices/charts/nv-ingest-26.03.0-RC2.tgz \
    -n ${NAMESPACE} \
    --username '$oauthtoken' \
    --password "${NGC_API_KEY}" \
    --set ngcImagePullSecret.create=true \
    --set ngcImagePullSecret.password="${NGC_API_KEY}" \
    --set ngcApiSecret.create=true \
    --set ngcApiSecret.password="${NGC_API_KEY}" \
    --set image.repository="nvcr.io/nvidia/nemo-microservices/nv-ingest" \
    --set image.tag="26.03.0-RC2"
```

Optionally you can create your own versions of the `Secrets` if you do not want to use the creation via the helm chart.


```bash

NAMESPACE=nv-ingest
DOCKER_CONFIG='{"auths":{"nvcr.io":{"username":"$oauthtoken", "password":"'${NGC_API_KEY}'" }}}'
echo -n $DOCKER_CONFIG | base64 -w0
NGC_REGISTRY_PASSWORD=$(echo -n $DOCKER_CONFIG | base64 -w0 )

kubectl apply -n ${NAMESPACE} -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: ngcImagePullSecret
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: ${NGC_REGISTRY_PASSWORD}
EOF
kubectl create -n ${NAMESPACE} secret generic ngc-api --from-literal=NGC_API_KEY=${NGC_API_KEY}

```

Alternatively, you can also use an External Secret Store like Vault, the name of the secret name expected for the NGC API is `ngc-api` and the secret name expected for NVCR is `ngcImagePullSecret`.

In this case, make sure to remove the following from your helm command:

```bash
    --set ngcImagePullSecret.create=true \
    --set ngcImagePullSecret.password="${NGC_API_KEY}" \
    --set ngcApiSecret.create=true \
    --set ngcApiSecret.password="${NGC_API_KEY}" \
```

## Usage

Jobs are submitted via the `nv-ingest-cli` command.

#### NV-Ingest CLI Installation

NV-Ingest uses a HTTP/Rest based submission method. By default the Rest service runs on port `7670`.

First, build `nv-ingest-cli` from the source to ensure you have the latest code.
For more information, refer to [NV-Ingest-Client](https://github.com/NVIDIA/nv-ingest/tree/main/client).

```bash
# Just to be cautious we remove any existing installation
pip uninstall nv-ingest-client

pip install nv-ingest-client==26.03.0-RC2
```

#### Rest Endpoint Ingress

It is recommended that the end user provide a mechanism for [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress/) for the NV-Ingest pod.
You can test outside of your Kubernetes cluster by [port-forwarding](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_port-forward/) the NV-Ingest pod to your local environment.

Example:

You can find the name of your NV-Ingest pod you want to forward traffic to by running:

```bash
kubectl get pods -n <namespace> --no-headers -o custom-columns=":metadata.name"
kubectl port-forward -n ${NAMESPACE} service/nv-ingest 7670:7670
```

The output will look similar to the following but with different auto-generated sequences.

```
nv-ingest-674f6b7477-65nvm
nv-ingest-etcd-0
nv-ingest-milvus-standalone-7f8ffbdfbc-jpmlj
nv-ingest-minio-7cbd4f5b9d-99hl4
nv-ingest-opentelemetry-collector-7bb59d57fc-4h59q
nv-ingest-ocr-0
nv-ingest-redis-master-0
nv-ingest-redis-replicas-0
nv-ingest-yolox-0
nv-ingest-zipkin-77b5fc459f-ptsj6
```

```bash
kubectl port-forward -n ${NAMESPACE} nv-ingest-674f6b7477-65nvm 7670:7670
```

#### Enabling GPU time-slicing

##### Configure the NVIDIA Operator

The NVIDIA GPU operator is used to make the Kubernetes cluster aware of the GPUs.

```bash
helm install --wait --generate-name --repo https://helm.ngc.nvidia.com/nvidia \
    --namespace gpu-operator --create-namespace \
    gpu-operator --set driver.enabled=false
```

Once this is installed we can check the number of available GPUs. Your ouput might differ depending on the number
of GPUs on your nodes.

```console
kubectl get nodes -o json | jq -r '.items[] | select(.metadata.name | test("-worker[0-9]*$")) | {name: .metadata.name, "nvidia.com/gpu": .status.allocatable["nvidia.com/gpu"]}'
{
  "name": "one-worker-one-gpu-worker",
  "nvidia.com/gpu": "1"
}
```

##### Enable time-slicing

With the NVIDIA GPU operator properly installed we want to enable time-slicing to allow more than one Pod to use this GPU. We do this by creating a time-slicing config file using [`time-slicing-config.yaml`](time-slicing/time-slicing-config.yaml).

```bash
kubectl apply -n gpu-operator -f time-slicing/time-slicing-config.yaml
```

Then update the cluster policy to use this new config.

```bash
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
    -n gpu-operator --type merge \
    -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config", "default": "any"}}}}'
```


Now if we check the number of GPUs available we should see `16`, but note this is still just GPU `0` which we exposed to the node, just sliced into `16`.

```console
kubectl get nodes -o json | jq -r '.items[] | select(.metadata.name | test("-worker[0-9]*$")) | {name: .metadata.name, "nvidia.com/gpu": .status.allocatable["nvidia.com/gpu"]}'
{
  "name": "one-worker-one-gpu-worker",
  "nvidia.com/gpu": "16"
}
```

#### Enable NVIDIA GPU MIG

[NVIDIA MIG](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-mig.html) is a technology that enables a specific GPU to be sliced into individual virtual GPUs.
This approach is considered better for production environments than time-slicing, because it isolates it process to a pre-allocated amount of compute and memory.

Use this section to learn how to enable NVIDIA GPU MIG.

##### Compatible GPUs

For the list of GPUs that are compatible with MIG, refer to [the MIG support matrix](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-gpus).

##### Understanding GPU profiles

You can think of a GPU profile like a traditional virtual computer (vm), where each vm has a predefined set of compute and memory.

Each NVIDIA GPU has different valid profiles. To see the profiles that are available for your GPU, run the following code.

```bash
nvidia-smi mig -lgip
```

The complete matrix of available profiles for your GPU appears. To understand the output, refer to [Supported MIG Profiles](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-mig-profiles).

##### MIG Profile configuration

An example MIG profile for a DGX H100 can be found in mig/nv-ingest-mig-config.yaml. This profile demonstrates mixed MIG modes across different GPUs
on the DGX machine.
You should edit the file for your scenario, and include only the profiles supported by your GPU.

To install the MIG profile on the Kubernetes cluster, use the following code.

```bash
kubectl apply -n gpu-operator -f mig/nv-ingest-mig-config.yaml
```

After the configmap is installed, you can adjust the MIG profile to be `mixed`. We do this because our configuration
file specifies different profiles across the GPUs. This is not required if you are not using a `mixed` MIG mode.

```bash
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
    --type='json' \
    -p='[{"op":"replace", "path":"/spec/mig/strategy", "value":"mixed"}]'
```

Patch the cluster so MIG manager uses the custom config map by using the following code.

```bash
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
    --type='json' \
    -p='[{"op":"replace", "path":"/spec/migManager/config/name", "value":"nv-ingest-mig-config"}]'
```

Label the nodes with which MIG profile you would like for them to use by using the following code.

```bash
kubectl label nodes <node-name> nvidia.com/mig.config=single-gpu-nv-ingest --overwrite
```

Validate that the configuration was applied by running the following code.

```bash
kubectl logs -n gpu-operator -l app=nvidia-mig-manager -c nvidia-mig-manager
```

You can adjust Kubernetes request and limit resources for MIG by using a Helm values file. To use a MIG values file in conjunction with other values files, append `-f mig/nv-ingest-mig-values.yaml` to your Helm command. For an example Helm values file for MIG settings, refer to [mig/nv-ingest-mig-config.yaml](mig/nv-ingest-mig-config.yaml). This file is only an example, and you should adjust the values for your environment and specific needs.

#### Running Jobs

Here is a sample invocation of a PDF extraction task using the port forward above:

```bash
mkdir -p ./processed_docs

nv-ingest-cli \
  --doc /path/to/your/unique.pdf \
  --output_directory ./processed_docs \
  --task='extract:{"document_type": "pdf", "extract_text": true, "extract_images": true, "extract_tables": true, "extract_charts": true}' \
  --client_host=localhost \
  --client_port=7670
```

You can also use NV-Ingest's Python client API to interact with the service running in the cluster. Use the same host and port as in the previous nv-ingest-cli example.

{{ template "chart.requirementsSection" . }}

{{ template "chart.valuesSection" . }}