Enable query rewriting support

Query rewriting enables higher accuracy for multiturn queries by making an additional LLM call to decontextualize the incoming question, before sending it to the retrieval pipeline.

Once you have followed steps in quick start guide to launch the blueprint, to enable query rewriting support, developers have two options:

Enable query rewriting support

Using on-prem model (Recommended)

Deploy the llama3.1-8b-instruct model on-prem. You need a H100 or A100 GPU to deploy this model.

export LLM_8B_MS_GPU_ID=<AVAILABLE_GPU_ID>
USERID=$(id -u) docker compose -f deploy/compose/nims.yaml --profile llama-8b up -d

Make sure the nim-llm container is up and in healthy state before proceeding further

docker ps --filter "name=nim-llm-llama-8b" --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

Example Output

NAMES                                   STATUS
nim-llm-llama-8b                     Up 38 minutes (healthy)

Enable query rewriting Export the below environment variable and relaunch the rag-server container.
```
export APP_QUERYREWRITER_SERVERURL="nim-llm-llama-8b:8000"
export ENABLE_QUERYREWRITER="True"
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d
```
Alternatively, you can enable this at runtime during retrieval by setting enable_query_rewriting: True as part of the schema of POST /generate API, without relaunching the containers. Refer to the retrieval notebook.

Using cloud hosted model

Set the server url to empty string to point towards cloud hosted model
```
export APP_QUERYREWRITER_SERVERURL=""
```

Relaunch the rag-server container by enabling query rewriter.

export ENABLE_QUERYREWRITER="True"
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d

[!TIP]: You can change the model name and model endpoint in case of an externally hosted LLM model by setting these two environment variables and restarting the rag services

export APP_QUERYREWRITER_SERVERURL="<llm_nim_http_endpoint_url>"
export APP_QUERYREWRITER_MODELNAME="<model_name>"

Using Helm Chart (on-prem only)

This section describes how to enable Query Rewriting when you deploy by using Helm, using an on-prem deployment of the LLM model.

Note

Only on-prem deployment of the LLM is supported. The model must be deployed separately using the NIM LLM Helm chart.

1. Deploy the Query Rewriter LLM using Helm

To deploy the llama-3.1-8b-instruct model in a separate namespace (query-rewriter), use the following procedure.

Export your NGC API key.
```
export NGC_API_KEY=<your_ngc_api_key>
```
Create a namespace.
```
kubectl create ns query-rewriter
```

Create required secrets.

kubectl create secret -n query-rewriter docker-registry ngc-secret \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password=$NGC_API_KEY

kubectl create secret -n query-rewriter generic ngc-api \
  --from-literal=NGC_API_KEY=$NGC_API_KEY

Create a custom_values.yaml file with the following content.

service:
  name: "nim-llm"
image:
  repository: nvcr.io/nim/meta/llama-3.1-8b-instruct
  pullPolicy: IfNotPresent
  tag: "1.3.0"
resources:
  limits:
    nvidia.com/gpu: 1
  requests:
    nvidia.com/gpu: 1
model:
  ngcAPISecret: ngc-api
  name: "meta/llama-3.1-8b-instruct"
persistence:
  enabled: true
imagePullSecrets:
  - name: ngc-secret

Install the Helm chart.

helm upgrade --install nim-llm -n query-rewriter https://helm.ngc.nvidia.com/nim/charts/nim-llm-1.7.0.tgz \
  --username='$oauthtoken' \
  --password=$NGC_API_KEY \
  -f custom_values.yaml

2. Enable Query Rewriter in `rag-server` Helm deployment

Modify the values.yaml file, in the envVars section, and set the following values.

   envVars:
      ##===Query Rewriter Model specific configurations===
      APP_QUERYREWRITER_MODELNAME: "meta/llama-3.1-8b-instruct"
      APP_QUERYREWRITER_SERVERURL: "nim-llm.query-rewriter:8000"  # Fully qualified service name
      ENABLE_QUERYREWRITER: "True"

Follow the steps from Quick Start Helm Deployment and use the following command to deploy the chart.

helm install rag -n rag https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-rag-v2.1.0.tgz \
   --username '$oauthtoken' \
   --password "${NGC_API_KEY}" \
   --set imagePullSecret.password=$NGC_API_KEY \
   --set ngcApiSecret.password=$NGC_API_KEY \
   -f rag-server/values.yaml

Note

This setup increases the total GPU requirement to 10xH100.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable query rewriting support

Using on-prem model (Recommended)

Using cloud hosted model

Using Helm Chart (on-prem only)

1. Deploy the Query Rewriter LLM using Helm

2. Enable Query Rewriter in `rag-server` Helm deployment

FilesExpand file tree

query_rewriter.md

Latest commit

History

query_rewriter.md

File metadata and controls

Enable query rewriting support

Using on-prem model (Recommended)

Using cloud hosted model

Using Helm Chart (on-prem only)

1. Deploy the Query Rewriter LLM using Helm

2. Enable Query Rewriter in rag-server Helm deployment

2. Enable Query Rewriter in `rag-server` Helm deployment