Skip to content

Commit e1e9052

Browse files
committed
refactor: integrate NeMo Guardrails API and remove restart scripts
- Integrate NVIDIA NeMo Guardrails API with httpx client - Add RAIL_API_KEY and RAIL_API_URL environment variable support - Implement API-based safety checks with pattern matching fallback - Update architecture docs with correct model names - Remove restart_backend.sh and restart_backend_simple.sh scripts - Add scripts/view_logs.sh for log viewing - Update README with guardrails API integration details
1 parent 3d36791 commit e1e9052

10 files changed

Lines changed: 873 additions & 309 deletions

File tree

.env.example

Lines changed: 175 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,190 @@
1+
# =============================================================================
2+
# Warehouse Operational Assistant - Environment Configuration
3+
# =============================================================================
4+
#
5+
# Copy this file to .env and update with your actual values:
6+
# cp .env.example .env
7+
# nano .env # or your preferred editor
8+
#
9+
# For Docker Compose deployments, place .env in deploy/compose/ directory
10+
# =============================================================================
11+
12+
# =============================================================================
13+
# ENVIRONMENT
14+
# =============================================================================
15+
# Set to 'production' for production deployments, 'development' for local dev
16+
ENVIRONMENT=development
17+
18+
# =============================================================================
19+
# DATABASE CONFIGURATION (PostgreSQL/TimescaleDB)
20+
# =============================================================================
21+
# Database connection settings
122
POSTGRES_USER=warehouse
2-
POSTGRES_PASSWORD=warehousepw
23+
POSTGRES_PASSWORD=changeme # ⚠️ CHANGE IN PRODUCTION!
324
POSTGRES_DB=warehouse
25+
DB_HOST=localhost
26+
DB_PORT=5435
427

5-
# Database Configuration
6-
PGHOST=127.0.0.1
7-
PGPORT=5435
8-
9-
# Redis Configuration
10-
REDIS_HOST=127.0.0.1
11-
REDIS_PORT=6379
28+
# Alternative database URL format (overrides individual settings above)
29+
# DATABASE_URL=postgresql://warehouse:changeme@localhost:5435/warehouse
1230

13-
# Kafka Configuration
14-
KAFKA_BROKER=kafka:9092
31+
# =============================================================================
32+
# SECURITY
33+
# =============================================================================
34+
# JWT Secret Key - REQUIRED for production, optional for development
35+
# Generate a strong random key: openssl rand -hex 32
36+
# Minimum 32 characters recommended
37+
JWT_SECRET_KEY=your-strong-random-secret-minimum-32-characters-change-this-in-production
1538

16-
# Milvus Configuration
17-
MILVUS_HOST=127.0.0.1
18-
MILVUS_PORT=19530
39+
# Admin user default password (change in production!)
40+
DEFAULT_ADMIN_PASSWORD=changeme
1941

20-
# NVIDIA NIM Configuration
21-
NVIDIA_API_KEY=your_nvidia_ngc_api_key_here
22-
LLM_NIM_URL=https://integrate.api.nvidia.com/v1
23-
EMBEDDING_NIM_URL=https://integrate.api.nvidia.com/v1
42+
# =============================================================================
43+
# REDIS CONFIGURATION
44+
# =============================================================================
45+
REDIS_HOST=localhost
46+
REDIS_PORT=6379
47+
REDIS_PASSWORD= # Leave empty for development
48+
REDIS_DB=0
2449

25-
# Optional: NeMo Guardrails Configuration
26-
RAIL_API_KEY=your_nvidia_ngc_api_key_here
27-
DATABASE_URL=postgresql://warehouse:warehousepw@localhost:5435/warehouse
50+
# =============================================================================
51+
# VECTOR DATABASE (Milvus)
52+
# =============================================================================
53+
MILVUS_HOST=localhost
54+
MILVUS_PORT=19530
55+
MILVUS_USER=root
56+
MILVUS_PASSWORD=Milvus
2857

29-
# GPU Acceleration Configuration
58+
# GPU Acceleration for Milvus
3059
MILVUS_USE_GPU=true
3160
MILVUS_GPU_DEVICE_ID=0
3261
CUDA_VISIBLE_DEVICES=0
3362
MILVUS_INDEX_TYPE=GPU_CAGRA
3463
MILVUS_COLLECTION_NAME=warehouse_docs_gpu
3564

65+
# =============================================================================
66+
# MESSAGE QUEUE (Kafka)
67+
# =============================================================================
68+
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
69+
# Alternative: KAFKA_BROKER=kafka:9092
70+
71+
# =============================================================================
72+
# NVIDIA NIM LLM CONFIGURATION
73+
# =============================================================================
74+
#
75+
# IMPORTANT: Different models use different endpoints!
76+
#
77+
# For the 49B model (llama-3.3-nemotron-super-49b-v1):
78+
# - Use: https://api.brev.dev/v1
79+
# - This is the correct endpoint for the 49B model
80+
#
81+
# For other NVIDIA NIM models:
82+
# - Use: https://integrate.api.nvidia.com/v1
83+
# - This is the standard NVIDIA NIM endpoint
84+
#
85+
# For self-hosted NIM instances:
86+
# - Use your own endpoint URL (e.g., http://localhost:8000/v1 or https://your-nim-instance.com/v1)
87+
# - Ensure your NIM instance is accessible and properly configured
88+
#
89+
# Your NVIDIA API key (same key works for both endpoints)
90+
NVIDIA_API_KEY=your-nvidia-api-key-here
91+
92+
# LLM Service Endpoint
93+
# For 49B model: https://api.brev.dev/v1
94+
# For other NIMs: https://integrate.api.nvidia.com/v1
95+
# For self-hosted: http://your-nim-host:port/v1
96+
LLM_NIM_URL=https://api.brev.dev/v1
97+
98+
# LLM Model Identifier
99+
# Example for 49B model:
100+
LLM_MODEL=nvcf:nvidia/llama-3.3-nemotron-super-49b-v1:dep-36ZiLbQIG2ZzK7gIIC5yh1E6lGk
101+
102+
# LLM Generation Parameters
103+
LLM_TEMPERATURE=0.1
104+
LLM_MAX_TOKENS=2000
105+
LLM_TOP_P=1.0
106+
LLM_FREQUENCY_PENALTY=0.0
107+
LLM_PRESENCE_PENALTY=0.0
108+
LLM_CLIENT_TIMEOUT=120 # Timeout in seconds
109+
110+
# LLM Caching
111+
LLM_CACHE_ENABLED=true
112+
LLM_CACHE_TTL_SECONDS=300 # Cache TTL in seconds (5 minutes)
113+
114+
# =============================================================================
115+
# EMBEDDING SERVICE CONFIGURATION
116+
# =============================================================================
117+
# Embedding service endpoint (typically uses NVIDIA endpoint)
118+
EMBEDDING_NIM_URL=https://integrate.api.nvidia.com/v1
119+
# Embedding API key (usually same as NVIDIA_API_KEY)
120+
# EMBEDDING_API_KEY=your-embedding-api-key # Defaults to NVIDIA_API_KEY if not set
121+
122+
# =============================================================================
123+
# CORS CONFIGURATION
124+
# =============================================================================
125+
# Allowed origins for CORS (comma-separated)
126+
# Add your frontend URLs here
127+
CORS_ORIGINS=http://localhost:3001,http://localhost:3000,http://127.0.0.1:3001,http://127.0.0.1:3000
128+
129+
# =============================================================================
130+
# UPLOAD & REQUEST LIMITS
131+
# =============================================================================
132+
# Maximum request size in bytes (default: 10MB)
133+
MAX_REQUEST_SIZE=10485760
134+
135+
# Maximum upload size in bytes (default: 50MB)
136+
MAX_UPLOAD_SIZE=52428800
137+
138+
# =============================================================================
139+
# NeMo Guardrails Configuration
140+
# =============================================================================
141+
# RAIL_API_KEY=your_nvidia_ngc_api_key_here
142+
143+
# =============================================================================
36144
# Document Extraction Agent - NVIDIA NeMo API Keys
37-
NEMO_RETRIEVER_API_KEY=your_nvidia_ngc_api_key_here
38-
NEMO_OCR_API_KEY=your_nvidia_ngc_api_key_here
39-
NEMO_PARSE_API_KEY=your_nvidia_ngc_api_key_here
40-
LLAMA_NANO_VL_API_KEY=your_nvidia_ngc_api_key_here
41-
LLAMA_70B_API_KEY=your_nvidia_ngc_api_key_here
145+
# =============================================================================
146+
# NEMO_RETRIEVER_API_KEY=your_nvidia_ngc_api_key_here
147+
# NEMO_OCR_API_KEY=your_nvidia_ngc_api_key_here
148+
# NEMO_PARSE_API_KEY=your_nvidia_ngc_api_key_here
149+
# LLAMA_NANO_VL_API_KEY=your_nvidia_ngc_api_key_here
150+
# LLAMA_70B_API_KEY=your_nvidia_ngc_api_key_here
151+
152+
# =============================================================================
153+
# EXTERNAL SERVICE INTEGRATIONS
154+
# =============================================================================
155+
# WMS_API_KEY=your-wms-api-key
156+
# ERP_API_KEY=your-erp-api-key
157+
158+
# =============================================================================
159+
# NOTES FOR DEVELOPERS
160+
# =============================================================================
161+
#
162+
# 1. LLM Endpoint Configuration:
163+
# - The 49B model REQUIRES https://api.brev.dev/v1
164+
# - Other NIM models use https://integrate.api.nvidia.com/v1
165+
# - Both endpoints use the same NVIDIA_API_KEY
166+
# - You can deploy NIMs on your own instances and consume them via endpoint
167+
# (e.g., http://localhost:8000/v1 or https://your-nim-instance.com/v1)
168+
# - For self-hosted NIMs, ensure the endpoint is accessible and properly configured
169+
#
170+
# 2. Security:
171+
# - NEVER commit .env files to version control
172+
# - Change all default passwords in production
173+
# - Use strong, unique JWT_SECRET_KEY in production
174+
# - JWT_SECRET_KEY is REQUIRED in production (app will fail to start without it)
175+
#
176+
# 3. Database:
177+
# - Default port 5435 is used to avoid conflicts with standard PostgreSQL (5432)
178+
# - Ensure Docker containers are running before starting the backend
179+
#
180+
# 4. Testing:
181+
# - View logs in real-time: ./scripts/view_logs.sh
182+
# - Restart backend: ./restart_backend.sh
183+
# - Check health: curl http://localhost:8001/api/v1/health
184+
#
185+
# 5. Getting NVIDIA API Keys:
186+
# - Sign up at: https://build.nvidia.com/
187+
# - Get your API key from the NVIDIA dashboard
188+
# - The same key works for both brev.dev and integrate.api.nvidia.com endpoints
189+
#
190+
# =============================================================================

README.md

Lines changed: 51 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -498,9 +498,10 @@ The system implements **NVIDIA NeMo Guardrails** for content safety, security, a
498498

499499
NeMo Guardrails provides multi-layer protection for the warehouse operational assistant:
500500

501+
- **API Integration** - Uses NVIDIA NeMo Guardrails API for intelligent safety validation
501502
- **Input Safety Validation** - Checks user queries before processing
502503
- **Output Safety Validation** - Validates AI responses before returning to users
503-
- **Pattern-Based Detection** - Identifies violations using keyword and phrase matching
504+
- **Pattern-Based Fallback** - Falls back to keyword/phrase matching if API is unavailable
504505
- **Timeout Protection** - Prevents hanging requests with configurable timeouts
505506
- **Graceful Degradation** - Continues operation even if guardrails fail
506507

@@ -546,7 +547,31 @@ Redirects non-warehouse related queries:
546547

547548
### Configuration
548549

549-
Guardrails configuration is defined in `data/config/guardrails/rails.yaml`:
550+
#### Environment Variables
551+
552+
The guardrails service can be configured via environment variables:
553+
554+
```bash
555+
# NeMo Guardrails API Configuration
556+
# Use RAIL_API_KEY for guardrails-specific key, or it will fall back to NVIDIA_API_KEY
557+
RAIL_API_KEY=your-nvidia-api-key-here
558+
559+
# Guardrails API endpoint (defaults to NVIDIA's cloud endpoint)
560+
RAIL_API_URL=https://integrate.api.nvidia.com/v1
561+
562+
# Timeout for guardrails API calls in seconds (default: 10)
563+
GUARDRAILS_TIMEOUT=10
564+
565+
# Enable/disable API usage (default: true)
566+
# If false, will only use pattern-based matching
567+
GUARDRAILS_USE_API=true
568+
```
569+
570+
**Note:** If `RAIL_API_KEY` is not set, the service will use `NVIDIA_API_KEY` as a fallback. If neither is set, the service will use pattern-based matching only.
571+
572+
#### YAML Configuration
573+
574+
Guardrails configuration is also defined in `data/config/guardrails/rails.yaml`:
550575

551576
```yaml
552577
# Safety and compliance rules
@@ -623,10 +648,12 @@ python tests/unit/test_guardrails.py
623648
The guardrails service (`src/api/services/guardrails/guardrails_service.py`) provides:
624649

625650
- **GuardrailsService** class with async methods
626-
- **Pattern matching** for violation detection
651+
- **API Integration** - Calls NVIDIA NeMo Guardrails API for intelligent validation
652+
- **Pattern-based Fallback** - Falls back to keyword/phrase matching if API unavailable
627653
- **Safety response generation** based on violation types
628654
- **Configuration loading** from YAML files
629655
- **Error handling** with graceful degradation
656+
- **Automatic fallback** - Seamlessly switches to pattern matching on API failures
630657

631658
### Response Format
632659

@@ -666,13 +693,32 @@ Guardrails activity is logged and monitored:
666693
4. **Customization**: Adjust timeout values based on your infrastructure
667694
5. **Response Messages**: Keep safety responses professional and helpful
668695

696+
### API Integration Details
697+
698+
The guardrails service now integrates with the NVIDIA NeMo Guardrails API:
699+
700+
1. **Primary Method**: API-based validation using NVIDIA's guardrails endpoint
701+
- Uses `/chat/completions` endpoint with safety-focused prompts
702+
- Leverages LLM-based violation detection for more intelligent analysis
703+
- Returns structured JSON with violation details and confidence scores
704+
705+
2. **Fallback Method**: Pattern-based matching
706+
- Automatically used if API is unavailable or times out
707+
- Uses keyword/phrase matching for common violation patterns
708+
- Ensures system continues to function even without API access
709+
710+
3. **Hybrid Approach**: Best of both worlds
711+
- API provides intelligent, context-aware validation
712+
- Pattern matching ensures reliability and low latency fallback
713+
- Seamless switching between methods based on availability
714+
669715
### Future Enhancements
670716

671717
Planned improvements:
672-
- Integration with full NeMo Guardrails SDK
673-
- LLM-based violation detection (beyond pattern matching)
718+
- Enhanced API integration with dedicated guardrails endpoints
674719
- Machine learning for adaptive threat detection
675720
- Enhanced monitoring dashboards
721+
- Custom guardrails rules via API configuration
676722

677723
**Related Documentation:**
678724
- Configuration file: `data/config/guardrails/rails.yaml`

docs/SOFTWARE_INVENTORY.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
This document lists all third-party software packages used in this project, including their versions, licenses, authors, and sources.
44

5-
**Generated:** Automatically from dependency files
6-
**Last Updated:** 2025-01-XX
5+
**Generated:** Automatically from dependency files
6+
**Last Updated:** 2025-12-08
77
**Generation Script:** `scripts/tools/generate_software_inventory.py`
88

99
## How to Regenerate
@@ -37,15 +37,16 @@ The script automatically:
3737
| Faker | 19.0.0 | MIT License | https://github.com/joke2k/faker | joke2k <joke2k@gmail.com> | PyPI | pip |
3838
| fastapi | 0.119.0 | MIT License | https://pypi.org/project/fastapi/ | Sebastián Ramírez <tiangolo@gmail.com> | PyPI | pip |
3939
| httpx | 0.27.0 | BSD License | https://pypi.org/project/httpx/ | Tom Christie <tom@tomchristie.com> | PyPI | pip |
40-
| langchain-core | 0.1.0 | MIT | https://github.com/langchain-ai/langchain | N/A | PyPI | pip |
40+
| langchain-core | 0.3.80 | MIT | https://pypi.org/project/langchain-core/ | N/A | PyPI | pip |
4141
| langgraph | 0.2.30 | MIT | https://www.github.com/langchain-ai/langgraph | N/A | PyPI | pip |
4242
| loguru | 0.7.0 | MIT license | https://github.com/Delgan/loguru | Delgan <delgan.py@gmail.com> | PyPI | pip |
4343
| numpy | 1.24.0 | BSD-3-Clause | https://www.numpy.org | Travis E. Oliphant et al. | PyPI | pip |
4444
| paho-mqtt | 1.6.0 | Eclipse Public License v2.0 / Eclipse Distribution License v1.0 | http://eclipse.org/paho | Roger Light <roger@atchoo.org> | PyPI | pip |
4545
| pandas | 1.2.4 | BSD | https://pandas.pydata.org | N/A | PyPI | pip |
4646
| passlib | 1.7.4 | BSD | https://passlib.readthedocs.io | Eli Collins <elic@assurancetechnologies.com> | PyPI | pip |
47-
| pillow | 10.0.0 | HPND | https://python-pillow.org | Jeffrey A. Clark (Alex) <aclark@aclark.net> | PyPI | pip |
47+
| pillow | 10.3.0 | HPND | https://pypi.org/project/Pillow/ | "Jeffrey A. Clark" <aclark@aclark.net> | PyPI | pip |
4848
| prometheus-client | 0.19.0 | Apache Software License 2.0 | https://github.com/prometheus/client_python | Brian Brazil <brian.brazil@robustperception.io> | PyPI | pip |
49+
| psutil | 5.9.0 | BSD | https://github.com/giampaolo/psutil | Giampaolo Rodola <g.rodola@gmail.com> | PyPI | pip |
4950
| psycopg | 3.0 | GNU Lesser General Public License v3 (LGPLv3) | https://psycopg.org/psycopg3/ | Daniele Varrazzo <daniele.varrazzo@gmail.com> | PyPI | pip |
5051
| pydantic | 2.7.0 | MIT License | https://pypi.org/project/pydantic/ | Samuel Colvin <s@muelcolvin.com>, Eric Jolibois <em.jolibois@gmail.com>, Hasan Ramezani <hasan.r67@gmail.com>, Adrian Garcia Badaracco <1755071+adr... | PyPI | pip |
5152
| PyJWT | 2.8.0 | MIT | https://github.com/jpadilla/pyjwt | Jose Padilla <hello@jpadilla.com> | PyPI | pip |
@@ -57,8 +58,8 @@ The script automatically:
5758
| python-multipart | 0.0.20 | Apache Software License | https://pypi.org/project/python-multipart/ | Andrew Dunham <andrew@du.nham.ca>, Marcelo Trylesinski <marcelotryle@gmail.com> | PyPI | pip |
5859
| PyYAML | 6.0 | MIT | https://pyyaml.org/ | Kirill Simonov <xi@resolvent.net> | PyPI | pip |
5960
| redis | 5.0.0 | MIT | https://github.com/redis/redis-py | Redis Inc. <oss@redis.com> | PyPI | pip |
60-
| requests | 2.31.0 | Apache 2.0 | https://requests.readthedocs.io | Kenneth Reitz <me@kennethreitz.org> | PyPI | pip |
61-
| scikit-learn | 1.0 | new BSD | http://scikit-learn.org | N/A | PyPI | pip |
61+
| requests | 2.32.4 | Apache-2.0 | https://requests.readthedocs.io | Kenneth Reitz <me@kennethreitz.org> | PyPI | pip |
62+
| scikit-learn | 1.5.0 | new BSD | https://scikit-learn.org | N/A | PyPI | pip |
6263
| tiktoken | 0.12.0 | MIT License | https://pypi.org/project/tiktoken/ | Shantanu Jain <shantanu@openai.com> | PyPI | pip |
6364
| uvicorn | 0.30.1 | BSD License | https://pypi.org/project/uvicorn/ | Tom Christie <tom@tomchristie.com> | PyPI | pip |
6465
| websockets | 11.0 | BSD-3-Clause | https://pypi.org/project/websockets/ | Aymeric Augustin <aymeric.augustin@m4x.org> | PyPI | pip |
@@ -93,20 +94,19 @@ The script automatically:
9394
| MIT | 14 |
9495
| BSD-3-Clause | 5 |
9596
| MIT License | 4 |
96-
| BSD | 3 |
97+
| BSD | 4 |
9798
| BSD License | 2 |
9899
| Apache License, Version 2.0 | 2 |
99100
| Apache Software License | 2 |
101+
| Apache-2.0 | 2 |
100102
| MIT license | 1 |
101103
| Apache 2 | 1 |
102104
| CC0 (copyright waived) | 1 |
103105
| Apache Software License 2.0 | 1 |
104106
| GNU Lesser General Public License v3 (LGPLv3) | 1 |
105107
| Eclipse Public License v2.0 / Eclipse Distribution License v1.0 | 1 |
106108
| N/A | 1 |
107-
| Apache 2.0 | 1 |
108109
| new BSD | 1 |
109-
| Apache-2.0 | 1 |
110110
| HPND | 1 |
111111
| GNU AFFERO GPL 3.0 | 1 |
112112
| ISC | 1 |

0 commit comments

Comments
 (0)