NVIDIA Stack Agent

Purpose

This project demonstrates GPU-Accelerated Generative AI and Agentic AI concepts in a minimal, reproducible form.

It uses NVIDIA NIM (NVIDIA Inference Microservices) to execute LLM workloads, combined with NeMo Guardrails for safety and controllability.
Because computation runs entirely on NVIDIA’s cloud GPU infrastructure, the demo operates without any local GPU and can run at zero cost.

Tech Stack

Component	Technology	Description
Framework	FastAPI	Lightweight, high-performance web API framework
AI API	NVIDIA NIM (meta/llama-3.1-8b-instruct)	Free, serverless LLM inference environment
Safety Layer	NeMo Guardrails	Dialogue control and tool access management
Agentic Tools	calc / kb	Calculator and FAQ retrieval (RAG-like behavior)
Storage	kb.json	Local knowledge base (replaceable with cloud storage)

System Overview

`/chat` Endpoint

Parses user messages and automatically routes to internal tools:
- Math expressions ↁEcalc
- Business hours / pricing / contact info ↁEkb
Generates concise, safe replies based on Guardrails policies.
Works even with OFFLINE_MODE=1 (mock mode without NIM API).

Example Interactions

Input	Internal Tool	Example Output
`(1000-250)*0.1`	`calc`	`75.0`
`What are your business hours and prices?`	`kb`	Summarized response in Japanese

Setup & Run

1. Environment Setup

git clone [email protected]:REICHIYAN/nvda_stack_agent.git
cd nvda_stack_agent
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

2. Environment Variables

Create a .env file:

# Option 1: Using real NIM API
NVIDIA_API_KEY=your_api_key_here

# Option 2: Offline mode (mock responses)
OFFLINE_MODE=1

3. Launch the Server

uvicorn app.main:app --reload

Expected output:

INFO:     Uvicorn running on http://127.0.0.1:8000

Testing with curl

Example 1: Knowledge Retrieval

curl -s http://127.0.0.1:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{"message":"Tell me business hours and pricing"}' | jq

Expected Output:

{
  "reply": "(Local response) Tool result: ...",
  "tool_calls": [
    {"name": "kb", "input": "Tell me business hours...", "result": "..."}
  ]
}

Example 2: Calculation

curl -s http://127.0.0.1:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{"message":"(1000-250)*0.1"}' | jq

Expected Output:

{
  "reply": "(Local response) Tool result: 75.0",
  "tool_calls": [
    {"name": "calc", "input": "(1000-250)*0.1", "result": "75.0"}
  ]
}

Folder Structure

nvda_stack_agent/
├─ app/
━E ├─ main.py              # FastAPI entry point
━E ├─ schemas.py           # Pydantic data models
━E ├─ rails/
━E ━E  ├─ colang/flows.co  # Guardrails Colang definitions
━E ━E  └─ tools.py         # calc / kb implementations
━E └─ __init__.py
├─ kb.json                 # Local knowledge base
├─ requirements.txt
├─ README.md
└─ .env (ignored)

Extensions (for Interview Discussion)

Multi-Cloud / Hybrid: NIM supports both API and on-prem GPU hosting.
AIOps / MLOps: Integrate Triton Inference Server for multi-model orchestration.
Storage / Migration: Replace kb.json with S3, GCS, or enterprise-grade storage.
Safety / Compliance: Guardrails enforces output filters, tool whitelisting, and secure execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NVIDIA Stack Agent

Purpose

Tech Stack

System Overview

`/chat` Endpoint

Example Interactions

Setup & Run

1. Environment Setup

2. Environment Variables

3. Launch the Server

Testing with curl

Example 1: Knowledge Retrieval

Example 2: Calculation

Folder Structure

Extensions (for Interview Discussion)

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
-H		-H
-d		-d
.gitignore		.gitignore
README.md		README.md
kb.json		kb.json
requirements.txt		requirements.txt
smoke_test.sh		smoke_test.sh

REICHIYAN/nvda_stack_agent

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Stack Agent

Purpose

Tech Stack

System Overview

/chat Endpoint

Example Interactions

Setup & Run

1. Environment Setup

2. Environment Variables

3. Launch the Server

Testing with curl

Example 1: Knowledge Retrieval

Example 2: Calculation

Folder Structure

Extensions (for Interview Discussion)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`/chat` Endpoint

Packages