- Route: Checks the RAG context for relevance to the query and adds live web search if the context is thin
- Evaluate: Checks responses for relevance and accuracy, flags hallucinations
- Iterate: Goes through multiple evaluation and generation cycles
- Edit Prompts: Customize results through your own prompts
- Change Parameters: Adjust agent behavior through parameters and runtime variables
- Look and Feel: Change the agent and UI by editing the code yourself
- Free Endpoints: use free endpoints on build.nvidia.com
- Self-Hosted: Point to Ollama or NIM on your own GPUs
- Easy Mode: Use the application
- Intermediate Mode: Modify the application
- Advanced Mode: Self-host gpus for inference
You can run Agentic RAG without Workbench, but this README requires NVIDIA AI Workbench installed. See how to install it here.
You need internet because Agentic RAG uses an NVIDIA endpoint for document embedding.
- Get NVIDIA and Tavily API keys:
- Clone this repo with AI Workbench > configure the keys when prompted.
- Click Open Chat > Go to the Document tab in the web app > Click Add to Context.
- Type in your question > Hit enter - answers come from free cloud endpoints.
Click to Expand Easy Mode

Steps | What can go wrong | Screen shot |
---|---|---|
1. Open the Desktop App > Select local. | Probably a Docker Desktop issue (if selected on install). Fix: See troubleshooting here | |
2. Click Clone Project > Paste repository URL > Clone | Incorrect URL. Fix: use the correct URL. | ![]() |
3. Click Resolve Now > Enter NVIDIA and Tavily API keys. | You don't see the banner. Fix: go to Project Container > Variables > Configure for API keys. See docs here | ![]() |
4. Click Open Chat. | Very little can go wrong here | ![]() |
5. Click Documents > Create Context. | Incorrect API key. Fix per Step 3 above. | ![]() |
6. Type question > Hit enter. | Incorrect API key. Fix per Step 3 above. | ![]() |
Use these steps when you want to work with your own documents and your own prompts.
Steps | What can go wrong | Screen shot |
---|---|---|
1. Click Documents > Clear Context. | Very little. | Vector DB reset. |
2. Delete the URLs > Add your own > Click Add to Context. | URLs that can't be resolved. Fix: Enter appropriate URLs | New context. |
3. Type question > Hit enter. | Incorrect API key. Fix: Fix per Step 3 in table above. | Triggers the agent. |
Click to Expand Intermediate Mode

This application is a quick prototype and not a robust piece of software. So there are many opportunities to improve it.
- Fork this project to your own GitHub account. Then clone it in Workbench
- Add VS Code to the project
- Create an
experiment
branch to protect main - Open VS Code from the Desktop App and edit the application code
- Change recursion limit, number of web sites returned by Tavily, whether previous searches are saved
- Add new endpoints from build.nvidia.com
- Change the look and feel of the Gradio app or add new features
- Modify the agent
- Fix any bugs you find
Click to Expand Advanced Mode
Use these details if you want to modify the application, e.g. by configuring prompts, adding your own endpoints, changing the Gradio app or whatever else occurs to you.
- Set up a Linux box with an NVIDIA GPU and Docker.
- Deploy an Ollama container or an NVIDIA NIM on that host.
- Configure the chat app to use the self-hosted endpoint.
This NVIDIA AI Workbench example project is under the Apache 2.0 License
This project may utilize additional third-party open source software projects. Review the license terms of these open source projects before use. Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components. You are responsible for confirming compliance with third-party component license terms and requirements.
❓ Have Questions? |
---|
Please direct any issues, fixes, suggestions, and discussion on this project to the DevZone Members Only Forum thread here |
⬇️ Download AI Workbench | 📖 User Guide |📂 Other Projects | 🚨 User Forum