Skip to content

Built a government-grade AI agent integrating NVIDIA NIM for fast inference and NeMo Guardrails for strict safety, with FastAPI orchestration and offline fallback.

Notifications You must be signed in to change notification settings

REICHIYAN/nvda_stack_agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NVIDIA Stack Agent

Purpose

This project demonstrates GPU-Accelerated Generative AI and Agentic AI concepts in a minimal, reproducible form.

It uses NVIDIA NIM (NVIDIA Inference Microservices) to execute LLM workloads, combined with NeMo Guardrails for safety and controllability.
Because computation runs entirely on NVIDIA’s cloud GPU infrastructure, the demo operates without any local GPU and can run at zero cost.


Tech Stack

Component Technology Description
Framework FastAPI Lightweight, high-performance web API framework
AI API NVIDIA NIM (meta/llama-3.1-8b-instruct) Free, serverless LLM inference environment
Safety Layer NeMo Guardrails Dialogue control and tool access management
Agentic Tools calc / kb Calculator and FAQ retrieval (RAG-like behavior)
Storage kb.json Local knowledge base (replaceable with cloud storage)

System Overview

/chat Endpoint

  • Parses user messages and automatically routes to internal tools:
    • Math expressions ↁEcalc
    • Business hours / pricing / contact info ↁEkb
  • Generates concise, safe replies based on Guardrails policies.
  • Works even with OFFLINE_MODE=1 (mock mode without NIM API).

Example Interactions

Input Internal Tool Example Output
(1000-250)*0.1 calc 75.0
What are your business hours and prices? kb Summarized response in Japanese

Setup & Run

1. Environment Setup

git clone [email protected]:REICHIYAN/nvda_stack_agent.git
cd nvda_stack_agent
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

2. Environment Variables

Create a .env file:

# Option 1: Using real NIM API
NVIDIA_API_KEY=your_api_key_here

# Option 2: Offline mode (mock responses)
OFFLINE_MODE=1

3. Launch the Server

uvicorn app.main:app --reload

Expected output:

INFO:     Uvicorn running on http://127.0.0.1:8000

Testing with curl

Example 1: Knowledge Retrieval

curl -s http://127.0.0.1:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{"message":"Tell me business hours and pricing"}' | jq

Expected Output:

{
  "reply": "(Local response) Tool result: ...",
  "tool_calls": [
    {"name": "kb", "input": "Tell me business hours...", "result": "..."}
  ]
}

Example 2: Calculation

curl -s http://127.0.0.1:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{"message":"(1000-250)*0.1"}' | jq

Expected Output:

{
  "reply": "(Local response) Tool result: 75.0",
  "tool_calls": [
    {"name": "calc", "input": "(1000-250)*0.1", "result": "75.0"}
  ]
}

Folder Structure

nvda_stack_agent/
├─ app/
━E ├─ main.py              # FastAPI entry point
━E ├─ schemas.py           # Pydantic data models
━E ├─ rails/
━E ━E  ├─ colang/flows.co  # Guardrails Colang definitions
━E ━E  └─ tools.py         # calc / kb implementations
━E └─ __init__.py
├─ kb.json                 # Local knowledge base
├─ requirements.txt
├─ README.md
└─ .env (ignored)

Extensions (for Interview Discussion)

  • Multi-Cloud / Hybrid: NIM supports both API and on-prem GPU hosting.
  • AIOps / MLOps: Integrate Triton Inference Server for multi-model orchestration.
  • Storage / Migration: Replace kb.json with S3, GCS, or enterprise-grade storage.
  • Safety / Compliance: Guardrails enforces output filters, tool whitelisting, and secure execution.

License

MIT License © 2025 Rei Taguchi

About

Built a government-grade AI agent integrating NVIDIA NIM for fast inference and NeMo Guardrails for strict safety, with FastAPI orchestration and offline fallback.

Resources

Stars

Watchers

Forks

Packages

No packages published