For this workflow, we will be leveraging NVIDIA ACE - a suite of technologies for bringing digital humans to life with generative AI.
This guide provides step-by-step instructions to set up and deploy the Digital Human Pipeline workflow using the NVIDIA ACE repository.
- Prerequisites
- Setup for Deployment
- Deploy Infrastructure and Application
- Verify the Deployment and UI
Refer to the Prerequisite Section to confirm the prerequisites are setup correctly. Refer to the AWS Setup Guide as well.
We will be leveraging the one-click AWS deployment script for deployment that automates and abstracts out complexities and completes the AWS instance provisioning, setup and deployment of our application.
Ensure you have access to an Ubuntu 20.04 or 22.04 based machine, either VM or workstation with sudo privileges for the user to run the automated deployment scripts.
On your local Ubuntu based system, Clone the ACE Repository and navigate to one-click script for AWS.
git clone https://github.com/NVIDIA/ACE.git
cd ACE/workflows/tokkio/scripts/one-click/aws
This directory has our the deployment terraform scripts and secrets.sh
for completing the deployment.
ls -l
deploy-spec
deploy-template.yml
secrets.sh
tokkio-deploy
Modify the secrets.sh
file with the AWS Access keys and other necessary secrets generated in the previous step.
export _aws_access_key_id='<your_access_key_id>'
export _aws_secret_access_key='<your_secret_access_key>'
export _ngc_api_key='<your_ngc_api_key>'
export _ssh_public_key='<your_ssh_public_key>'
The _openai_api_key
is not needed for this Tokkio-RAG workflow but is needed for the Tokkio-QSR workflow.
The _coturn_password
field is optional if you are using Reverse Proxy as the TURN server.
The deploy-template.yml
file is used to compile the infrastructure specification needed to setup the project/environment.
You can have multiple template files and create multiple unique environments per template file that are identified by the unique ID project_name
.
Update the deploy-template.yml
with your specific AWS Configurations in the below sections:
- metadata
project_name
: Unique identification of the environment. E.g.tokkio-oran-bot
template_version
: 0.4.0
- backend
dynamodb_table
: DynamoDB table name that was created in prerequisite step. E.g.tokkio-table
bucket
: S3 bucket name that was created in prerequisite step. E.g.tokkio-bucket
region
: The region the prerequisites were setup for. E.g.us-west-2
- provider
region
: Same as specified in backend.us-west-2
- spec
vpc_cidr_block
: This represents the private CIDR range in which the base, turn and app resources will be created. E.g. 0.0.0.0/24 AWS VPC Calculatordev_access_ipv4_cidr_blocks
: CIDR ranges from where SSH access should be allowed.user_access_ipv4_cidr_blocks
: CIDR ranges from where application UI and API will be allowed access.base_domain
: Domain name should be configured with the DNS hosted zone under which the apps will be registered. E.g.tokkio-oran-aws.nvidia.com
api_sub_domain
: Subdomain to be used for the API. E.g.tokkio-oran-bot-api
ui_sub_domain
: Subdomain to be used for the UI. E.g.tokkio-oran-bot-ui
turn_server_provider
: Chose betweenrp
for using reverse proxy configuration orcoturn
for coturn server configuration.app_instance_type
: Default options areg5.12xlarge
- 4xA10 GPUsg4dn.12xlarge
- 4xT4 GPUs
api_settings
:chart_name
: This points to the helm chart that will be pulled from NGC and installed. For this tokkio-llm workflow, the chart name would beucs-tokkio-audio-video-llm-app
ui_settings
:application_type
: Chosecustom
for tokkio-llm workflow. Default application type isqsr
.
For further explanations of all the entries in this yaml file, please refer to the AWS Documentation.
You can find example deploy-template.yml
files in the ACE Repository.
After the updates to the secrets.sh
and deploy-template.yml
, we can then deploy the tokkio-llm app.
bash tokkio-deploy preview
This command installs terraform
to complete the instance provisioning and other steps for the user and shows a preview of the changes staged to be made.
Output of preview
should be in this format:
app_infra = {
"api_endpoint" = "https://<api_sub_domain>.<base_domain>"
"elasticsearch_endpoint" = "https://elastic-<project_name>.<base_domain>"
"grafana_endpoint" = "https://grafana-<project_name>..<base_domain>"
"kibana_endpoint" = "https://kibana-<project_name>..<base_domain>"
"private_ips" = [
(known after apply)
]
"ui_endpoint" = "https://<ui_sub_domain>.<base_domain>"
}
bastion_infra = {
"private_ip" = (known after apply)
"public_ip" = (known after apply)
}
rp_infra = {
"private_ip" = (known after apply)
"public_ip" = (known after apply)
}
Example bash tokkio-deploy preview
output:
Plan: 86 to add, 0 to change, 0 to destroy
Changes to Outputs:
app_infra = {
"api_endpoint" = "https://tokkio-oran-bot-api.tokkio-oran-aws.nvidia.com"
"elasticsearch_endpoint" = "https://elastic-tokkio-oran-bot.tokkio-oran-aws.nvidia.com"
"grafana_endpoint" = "https://grafana-ntokkio-oran-bot.tokkio-oran-aws.nvidia.com"
"kibana_endpoint" = "https://kibana-tokkio-oran-bot.tokkio-oran-aws.nvidia.com"
"private_ips" = [
(known after apply)
]
"ui_endpoint" = "https://tokkio-oran-bot-ui.tokkio-oran-aws.nvidia.com"
}
bastion_infra = {
"private_ip" = (known after apply)
"public_ip" = (known after apply)
}
rp_infra = {
"private_ip" = (known after apply)
"public_ip" = (known after apply)
}
Install the changes showed in preview based on deploy-template.yml
:
bash tokkio-deploy install
On successful deployment of the Infra, you will get output displayed in the below format:
Apply complete! Resources: <nn> added, <nn> changed, <nn> destroyed
Outputs:
app_infra = {
"api_endpoint" = "https://<api_sub_domain>.<base_domain>"
"elasticsearch_endpoint" = "https://elastic-<project_name>.<base_domain>"
"grafana_endpoint" = "https://grafana-<project_name>..<base_domain>"
"kibana_endpoint" = "https://kibana-<project_name>..<base_domain>"
"private_ips" = [
"<private_ip_of_app_instace>",
]
"ui_endpoint" = "https://<ui_sub_domain>.<base_domain>"
}
bastion_infra = {
"private_ip" = "<bastion_instance_private_ip>"
"public_ip" = "<bastion_instance_public_ip>"
}
rp_infra = {
"private_ip" = "<rp_instance_private_ip>"
"public_ip" = "<rp_instance_public_ip>"
}
Use ssh command in below format to log into Application instance.
ssh -i <path-to-pem-file> -o StrictHostKeyChecking=no -o ProxyCommand="ssh -i <path-to-pem-file> -W %h:%p -o StrictHostKeyChecking=no ubuntu@<bastion-instance-public-ip>" ubuntu@<app-instance-private-ip>
The pem
file referred here is the private key associated with the public SSH key created in the Prerequisites section: ./ssh/id_rsa
Example SSH login command:
ssh -i ~/.ssh/id_rsa -o StrictHostKeyChecking=no -o ProxyCommand="ssh -i ~/.ssh/id_rsa -W %h:%p -o StrictHostKeyChecking=no [email protected]" [email protected]
Once you are able to login into the AWS instance, run the below command to see the Kubernetes pods' status.
ubuntu@ip-10-0-0-135:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
a2f-a2f-deployment-6d9f4d6ddd-n6gc9 1/1 Running 0 71m
ace-agent-chat-controller-deployment-0 1/1 Running 0 71m
ace-agent-chat-engine-deployment-687b4868c-dx7z8 1/1 Running 0 71m
ace-agent-plugin-server-deployment-7f7b7f848f-58l9z 1/1 Running 0 71m
anim-graph-sdr-envoy-sdr-deployment-5c9c8d58c6-dh7qx 3/3 Running 0 71m
chat-controller-sdr-envoy-sdr-deployment-77975fc6bf-tw9b4 3/3 Running 0 71m
ds-sdr-envoy-sdr-deployment-79676f5775-64knd 3/3 Running 0 71m
ds-visionai-ds-visionai-deployment-0 2/2 Running 0 71m
ia-animation-graph-microservice-deployment-0 1/1 Running 0 71m
ia-omniverse-renderer-microservice-deployment-0 1/1 Running 0 71m
ia-omniverse-renderer-microservice-deployment-1 1/1 Running 0 71m
ia-omniverse-renderer-microservice-deployment-2 1/1 Running 0 71m
mongodb-mongodb-bc489b954-jw45w 1/1 Running 0 71m
occupancy-alerts-api-app-cfb94cb7b-lnnff 1/1 Running 0 71m
occupancy-alerts-app-5b97f578d8-wtjws 1/1 Running 0 71m
redis-redis-5cb5cb8596-gncxk 1/1 Running 0 71m
redis-timeseries-redis-timeseries-55d476db56-2shpt 1/1 Running 0 71m
renderer-sdr-envoy-sdr-deployment-5d4d99c778-qm8sz 3/3 Running 0 71m
riva-speech-57dbbc9dbf-dmzpp 1/1 Running 0 71m
tokkio-cart-manager-deployment-55476f746b-7xbrg 1/1 Running 0 71m
tokkio-ingress-mgr-deployment-7cc446758f-bz6kz 3/3 Running 0 71m
tokkio-menu-api-deployment-748c8c6574-z8jdz 1/1 Running 0 71m
tokkio-ui-server-deployment-55fcbdd9f4-qwmtw 1/1 Running 0 71m
tokkio-umim-action-server-deployment-74977db6d6-sp682 1/1 Running 0 71m
triton0-766cdf66b8-6dsmq 1/1 Running 0 71m
vms-vms-bc7455786-6w7cz 1/1 Running 0 71m
It may take up-to an hour for the pods to turn into Running
state.
Once all the pods are running you can access the UI with the help of the URL printed in output attribute ui_endpoint
.
You can display the install command output on your local terminal by using the below command:
bash tokkio-deploy show-results
app_infra = {
"ui_endpoint" = "https://<ui_sub_domain>.<base_domain>"
}
Access UI at https://<ui_sub_domain>.<base_domain>
You're ready to interact with your Avatar!
This example showcases a LangChain based RAG pipeline that deploys the NIM for LLMs microservice to host a TensorRT optimized LLM and the NeMo Retriever Embedding microservice. Milvus is deployed as the vector database to store embeddings and generate responses to queries.
The knowledge base used in this RAG deployment is based on the O-RAN (Open Radio Access Network) ALLIANCE specifications. These specifications define open, interoperable standards and interfaces that transform traditional radio access networks into more flexible, virtualized, and cloud-native systems. The technical documentation (PDFs) of these specifications can be found here. For this example, you should download these PDFs into the ./data
directory. However, this pipeline is flexible and can also be used with other data from different domains or your own data for RAG.
By default, this RAG deployment uses meta/llama3-8b-instruct
without PEFT. If you customized the model with PEFT and LoRA, make sure to update LLM_MODEL_NAME
to lora-nemo-fw
in the .env
file to use the LoRA weights.
A minimum of two NVIDIA datacenter GPUs (such as A100, H100, or L40S models) are required.
- One GPU is needed for the inference container.
- One GPU is needed for the embedding container.
- In addition, Milvus requires at least one GPU by default.
-
Complete the Common Prerequisites.
-
Log in to the NVIDIA container registry using the following command:
docker login nvcr.io
Once prompted, you can use
$oauthtoken
as the username and yourNGC_API_Key
as the password.Then, export the
NGC_API_KEY
export NGC_API_KEY=<ngc-api-key>
-
Deploy the RAG pipeline.
USERID=$(id -u) docker compose --profile local-nim --profile milvus up -d
Example Output
CONTAINER ID NAMES STATUS 32515fcb8ad2 rag-playground Up 26 minutes d60e0cee49f7 rag-application-text-chatbot-langchain Up 27 minutes 02c8062f15da nemo-retriever-embedding-microservice Up 27 minutes (healthy) 7bd4d94dc7a7 nemollm-inference-microservice Up 27 minutes 55135224e8fd milvus-standalone Up 48 minutes (healthy) 5844248a08df milvus-minio Up 48 minutes (healthy) c42df344bb25 milvus-etcd Up 48 minutes (healthy)
-
Open your browser and interact with the RAG Playground at http://localhost:3001/orgs/nvidia/models/text-qa-chatbot.
-
Check out the API specs at http://localhost:8081/docs.
Note: Accessing the UI and endpoints:
The examples in this documentation use
localhost:port
to access the UI and endpoints.If you are running the application on a remote machine:
-
You may need to set up port forwarding to access the services on your local machine. Open a Terminal on your local machine and use the following command to set up port forwarding:
ssh -L local_port:localhost:remote_port username@remote_host
-
Alternatively, replace
localhost
with the actual IP address of the remote machine. For example: If the remote machine's IP is12.34.56.78
, use12.34.56.78:port
instead oflocalhost:port
.
Ensure you have the necessary permissions and have configured any relevant firewalls or security groups to allow access to the specified ports.
-
There are 3 methods available for adding data to the Retrieval-Augmented Generation (RAG) knowledge base.
-
Access the RAG Playground at http://localhost:3001/orgs/nvidia/models/text-qa-chatbot. In the Knowledge Base section, select "Drag and drop a file" to upload your PDF documents.
-
Alternatively, use the API by sending requests to the
/documents
endpoint at http://localhost:8081/documents. Below is an example cURL command for this method:pathToDocument="/path/to/document.pdf" curl -X 'POST' \ 'http://localhost:8081/documents' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F "file=@${pathToDocument};type=application/pdf"
To batch upload documents,
data_loader.sh
is provided. This script is designed to upload multiple PDF files from a specified directory to a server endpoint in one go. By default, it assumes that the documents are located in the./data
directory, but you can easily modify the script to use any directory of your choice.chmod +x data_loader.sh ./data_loader.sh
-
A third option is to use the provided data loader in
data_loader.ipynb
. This method requires installation of therequests
library via pip:pip install requests
This notebook includes examples for uploading both individual documents and entire directories of PDFs.
Now that the knowledge base has been populated, you can engage with the RAG-powered AI assistant in 2 different ways.
-
Utilize the RAG Playground interface:
- Navigate to http://localhost:3001/orgs/nvidia/models/text-qa-chatbot
- Locate the Demo section to engage with the conversational AI
- To enable RAG functionality, ensure the "Use Knowledge Base" option is selected
- To disable RAG, deselect the "Use Knowledge Base" option
-
Use the API by sending requests to the
/generate
endpoint at http://localhost:8081/generate. Below is an example cURL command for this method:curl -X 'POST' \ 'http://localhost:8081/generate' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "messages": [ { "role": "user", "content": "I am going to Paris, what should I see?" } ], "use_knowledge_base": true, "temperature": 0.2, "top_p": 0.7, "max_tokens": 1024, "stop": [] }'
To enable RAG functionality, set
"use_knowledge_base": true
as shown in the example above. Set it tofalse
to disable RAG.
After launching both the Digital Human Pipeline and RAG Pipeline, proceed to Connecting Your Digital Human Pipeline to Domain Adapted RAG.