Skip to content

Commit 63d7887

Browse files
committed
feat: consolidate guardrails documentation and update to SDK v0.19.0
- Consolidate all guardrails docs into single comprehensive guide - Delete individual phase documentation files - Create unified guardrails-implementation.md with complete overview - Update README.md guardrails section with link to detailed docs - Update nemoguardrails from v0.17.0 to v0.19.0 in requirements.txt - Update version references in documentation files
1 parent e1e9052 commit 63d7887

13 files changed

Lines changed: 1875 additions & 287 deletions

File tree

README.md

Lines changed: 58 additions & 207 deletions
Original file line numberDiff line numberDiff line change
@@ -496,234 +496,85 @@ The system implements **NVIDIA NeMo Guardrails** for content safety, security, a
496496

497497
### Overview
498498

499-
NeMo Guardrails provides multi-layer protection for the warehouse operational assistant:
500-
501-
- **API Integration** - Uses NVIDIA NeMo Guardrails API for intelligent safety validation
502-
- **Input Safety Validation** - Checks user queries before processing
503-
- **Output Safety Validation** - Validates AI responses before returning to users
504-
- **Pattern-Based Fallback** - Falls back to keyword/phrase matching if API is unavailable
505-
- **Timeout Protection** - Prevents hanging requests with configurable timeouts
506-
- **Graceful Degradation** - Continues operation even if guardrails fail
499+
The guardrails system provides **dual implementation support** with automatic fallback:
500+
501+
- **NeMo Guardrails SDK** (with Colang) - Intelligent, programmable guardrails using NVIDIA's official SDK
502+
-**Already included** in `requirements.txt` (`nemoguardrails>=0.19.0`)
503+
- Installed automatically when you run `pip install -r requirements.txt`
504+
- **Pattern-Based Matching** - Fast, lightweight fallback using keyword/phrase matching
505+
- **Feature Flag Control** - Runtime switching between implementations via `USE_NEMO_GUARDRAILS_SDK`
506+
- **Automatic Fallback** - Seamlessly switches to pattern-based if SDK unavailable
507+
- **Input & Output Validation** - Checks both user queries and AI responses
508+
- **Timeout Protection** - Prevents hanging requests (3s input, 5s output)
509+
- **Comprehensive Monitoring** - Metrics tracking for method usage and performance
507510

508511
### Protection Categories
509512

510-
The guardrails system protects against:
511-
512-
#### 1. Jailbreak Attempts
513-
Detects attempts to override system instructions:
514-
- "ignore previous instructions"
515-
- "forget everything"
516-
- "pretend to be"
517-
- "roleplay as"
518-
- "bypass"
519-
- "jailbreak"
520-
521-
#### 2. Safety Violations
522-
Prevents guidance that could endanger workers or equipment:
523-
- Operating equipment without training
524-
- Bypassing safety protocols
525-
- Working without personal protective equipment (PPE)
526-
- Unsafe equipment operation
527-
528-
#### 3. Security Violations
529-
Blocks requests for sensitive security information:
530-
- Security codes and access codes
531-
- Restricted area access
532-
- Alarm codes
533-
- System bypass instructions
534-
535-
#### 4. Compliance Violations
536-
Ensures adherence to regulations and policies:
537-
- Avoiding safety inspections
538-
- Skipping compliance requirements
539-
- Ignoring regulations
540-
- Working around safety rules
541-
542-
#### 5. Off-Topic Queries
543-
Redirects non-warehouse related queries:
544-
- Weather, jokes, cooking recipes
545-
- Sports, politics, entertainment
546-
- General knowledge questions
547-
548-
### Configuration
549-
550-
#### Environment Variables
551-
552-
The guardrails service can be configured via environment variables:
513+
The guardrails system protects against **88 patterns** across 5 categories:
514+
515+
1. **Jailbreak Attempts** (17 patterns) - Prevents instruction override attempts
516+
2. **Safety Violations** (13 patterns) - Blocks unsafe operational guidance
517+
3. **Security Violations** (15 patterns) - Prevents security information requests
518+
4. **Compliance Violations** (12 patterns) - Ensures regulatory adherence
519+
5. **Off-Topic Queries** (13 patterns) - Redirects non-warehouse queries
520+
521+
### Quick Configuration
553522

554523
```bash
555-
# NeMo Guardrails API Configuration
556-
# Use RAIL_API_KEY for guardrails-specific key, or it will fall back to NVIDIA_API_KEY
557-
RAIL_API_KEY=your-nvidia-api-key-here
524+
# Enable SDK implementation (recommended)
525+
USE_NEMO_GUARDRAILS_SDK=true
558526

559-
# Guardrails API endpoint (defaults to NVIDIA's cloud endpoint)
560-
RAIL_API_URL=https://integrate.api.nvidia.com/v1
527+
# NVIDIA API key (required for SDK)
528+
NVIDIA_API_KEY=your-api-key-here
561529

562-
# Timeout for guardrails API calls in seconds (default: 10)
530+
# Optional: Guardrails-specific configuration
531+
RAIL_API_KEY=your-api-key-here # Falls back to NVIDIA_API_KEY if not set
532+
RAIL_API_URL=https://integrate.api.nvidia.com/v1
563533
GUARDRAILS_TIMEOUT=10
564-
565-
# Enable/disable API usage (default: true)
566-
# If false, will only use pattern-based matching
567534
GUARDRAILS_USE_API=true
568535
```
569536

570-
**Note:** If `RAIL_API_KEY` is not set, the service will use `NVIDIA_API_KEY` as a fallback. If neither is set, the service will use pattern-based matching only.
571-
572-
#### YAML Configuration
573-
574-
Guardrails configuration is also defined in `data/config/guardrails/rails.yaml`:
575-
576-
```yaml
577-
# Safety and compliance rules
578-
safety_rules:
579-
- name: "jailbreak_detection"
580-
patterns:
581-
- "ignore previous instructions"
582-
- "forget everything"
583-
# ... more patterns
584-
response: "I cannot ignore my instructions..."
585-
586-
- name: "safety_violations"
587-
patterns:
588-
- "operate forklift without training"
589-
- "bypass safety protocols"
590-
# ... more patterns
591-
response: "Safety is our top priority..."
592-
```
593-
594-
**Configuration Features:**
595-
- Pattern-based rule definitions
596-
- Custom response messages for each violation type
597-
- Monitoring and logging configuration
598-
- Conversation limits and constraints
599-
600537
### Integration
601538

602-
Guardrails are integrated into the chat endpoint at two critical points:
603-
604-
1. **Input Safety Check** (before processing):
605-
```python
606-
input_safety = await guardrails_service.check_input_safety(req.message)
607-
if not input_safety.is_safe:
608-
return safety_response
609-
```
610-
611-
2. **Output Safety Check** (after AI response):
612-
```python
613-
output_safety = await guardrails_service.check_output_safety(ai_response)
614-
if not output_safety.is_safe:
615-
return safety_response
616-
```
617-
618-
**Timeout Protection:**
619-
- Input check: 3-second timeout
620-
- Output check: 5-second timeout
621-
- Graceful degradation on timeout
539+
Guardrails are automatically integrated into the chat endpoint:
540+
- **Input Safety Check** - Validates user queries before processing (3s timeout)
541+
- **Output Safety Check** - Validates AI responses before returning (5s timeout)
542+
- **Metrics Tracking** - Logs method used, performance, and safety status
622543

623544
### Testing
624545

625-
Comprehensive test suite available in `tests/unit/test_guardrails.py`:
626-
627546
```bash
628-
# Run guardrails tests
629-
python tests/unit/test_guardrails.py
630-
```
631-
632-
**Test Coverage:**
633-
- 18 test scenarios covering all violation categories
634-
- Legitimate query validation
635-
- Performance testing with concurrent requests
636-
- Response time measurement
637-
638-
**Test Categories:**
639-
- Jailbreak attempts (2 tests)
640-
- Safety violations (3 tests)
641-
- Security violations (3 tests)
642-
- Compliance violations (2 tests)
643-
- Off-topic queries (3 tests)
644-
- Legitimate warehouse queries (4 tests)
645-
646-
### Service Implementation
647-
648-
The guardrails service (`src/api/services/guardrails/guardrails_service.py`) provides:
649-
650-
- **GuardrailsService** class with async methods
651-
- **API Integration** - Calls NVIDIA NeMo Guardrails API for intelligent validation
652-
- **Pattern-based Fallback** - Falls back to keyword/phrase matching if API unavailable
653-
- **Safety response generation** based on violation types
654-
- **Configuration loading** from YAML files
655-
- **Error handling** with graceful degradation
656-
- **Automatic fallback** - Seamlessly switches to pattern matching on API failures
657-
658-
### Response Format
659-
660-
When a violation is detected, the system returns:
661-
662-
```json
663-
{
664-
"reply": "Safety is our top priority. I cannot provide guidance...",
665-
"route": "guardrails",
666-
"intent": "safety_violation",
667-
"context": {
668-
"safety_violations": ["Safety violation: 'operate forklift without training'"]
669-
},
670-
"confidence": 0.9
671-
}
672-
```
673-
674-
### Monitoring
675-
676-
Guardrails activity is logged and monitored:
677-
678-
- **Log Level**: INFO
679-
- **Conversation Logging**: Enabled
680-
- **Rail Hits Logging**: Enabled
681-
- **Metrics Tracked**:
682-
- Conversation length
683-
- Rail hits (violations detected)
684-
- Response time
685-
- Safety violations
686-
- Compliance issues
687-
688-
### Best Practices
547+
# Unit tests
548+
pytest tests/unit/test_guardrails_sdk.py -v
689549

690-
1. **Regular Updates**: Review and update patterns in `rails.yaml` based on new threats
691-
2. **Monitoring**: Monitor guardrails logs for patterns and trends
692-
3. **Testing**: Run test suite after configuration changes
693-
4. **Customization**: Adjust timeout values based on your infrastructure
694-
5. **Response Messages**: Keep safety responses professional and helpful
550+
# Integration tests (compares both implementations)
551+
pytest tests/integration/test_guardrails_comparison.py -v -s
695552

696-
### API Integration Details
697-
698-
The guardrails service now integrates with the NVIDIA NeMo Guardrails API:
699-
700-
1. **Primary Method**: API-based validation using NVIDIA's guardrails endpoint
701-
- Uses `/chat/completions` endpoint with safety-focused prompts
702-
- Leverages LLM-based violation detection for more intelligent analysis
703-
- Returns structured JSON with violation details and confidence scores
704-
705-
2. **Fallback Method**: Pattern-based matching
706-
- Automatically used if API is unavailable or times out
707-
- Uses keyword/phrase matching for common violation patterns
708-
- Ensures system continues to function even without API access
709-
710-
3. **Hybrid Approach**: Best of both worlds
711-
- API provides intelligent, context-aware validation
712-
- Pattern matching ensures reliability and low latency fallback
713-
- Seamless switching between methods based on availability
714-
715-
### Future Enhancements
553+
# Performance benchmarks
554+
pytest tests/integration/test_guardrails_comparison.py::test_performance_benchmark -v -s
555+
```
716556

717-
Planned improvements:
718-
- Enhanced API integration with dedicated guardrails endpoints
719-
- Machine learning for adaptive threat detection
720-
- Enhanced monitoring dashboards
721-
- Custom guardrails rules via API configuration
557+
### Documentation
722558

723-
**Related Documentation:**
724-
- Configuration file: `data/config/guardrails/rails.yaml`
725-
- Service implementation: `src/api/services/guardrails/guardrails_service.py`
726-
- Test suite: `tests/unit/test_guardrails.py`
559+
**📖 For comprehensive documentation, see: [Guardrails Implementation Guide](docs/architecture/guardrails-implementation.md)**
560+
561+
The detailed guide includes:
562+
- Complete architecture overview
563+
- Implementation details (SDK vs Pattern-based)
564+
- All 88 guardrails patterns
565+
- API interface documentation
566+
- Configuration reference
567+
- Monitoring & metrics
568+
- Testing instructions
569+
- Troubleshooting guide
570+
- Future roadmap
571+
572+
**Key Files:**
573+
- Service: `src/api/services/guardrails/guardrails_service.py`
574+
- SDK Wrapper: `src/api/services/guardrails/nemo_sdk_service.py`
575+
- Colang Config: `data/config/guardrails/rails.co`
576+
- NeMo Config: `data/config/guardrails/config.yml`
577+
- Legacy YAML: `data/config/guardrails/rails.yaml`
727578

728579
## Development Guide
729580

data/config/guardrails/config.yml

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# NeMo Guardrails Configuration
2+
# Warehouse Operational Assistant
3+
# Phase 2: Parallel Implementation
4+
5+
# =============================================================================
6+
# Models Configuration
7+
# =============================================================================
8+
# Note: For Phase 2, we use OpenAI-compatible endpoints (NVIDIA NIM supports this)
9+
# The SDK will use pattern matching via Colang for guardrails validation
10+
models:
11+
- type: main
12+
engine: openai
13+
model: nvidia/llama-3-70b-instruct
14+
parameters:
15+
api_key: ${NVIDIA_API_KEY}
16+
api_base: ${RAIL_API_URL:https://integrate.api.nvidia.com/v1}
17+
temperature: 0.1
18+
max_tokens: 1000
19+
top_p: 0.9
20+
21+
- type: embedding
22+
engine: openai
23+
model: nvidia/nv-embedqa-e5-v5
24+
parameters:
25+
api_key: ${NVIDIA_API_KEY}
26+
api_base: ${RAIL_API_URL:https://integrate.api.nvidia.com/v1}
27+
28+
# =============================================================================
29+
# Rails Configuration
30+
# =============================================================================
31+
rails:
32+
# Input rails - checked before processing user input
33+
input:
34+
flows:
35+
- check jailbreak
36+
- check safety violations
37+
- check security violations
38+
- check compliance violations
39+
- check off-topic queries
40+
41+
# Output rails - checked after AI generates response
42+
# Note: Output validation is handled in the service layer for now
43+
# Can be enhanced with Python actions in the future
44+
# output:
45+
# flows:
46+
# - self check facts
47+
48+
# Topical rails - control conversation topics
49+
config:
50+
topics:
51+
- warehouse operations
52+
- inventory management
53+
- safety compliance
54+
- equipment operations
55+
56+
# =============================================================================
57+
# Instructions
58+
# =============================================================================
59+
instructions:
60+
- type: general
61+
content: |
62+
You are a helpful warehouse operational assistant. You help with inventory management,
63+
operations coordination, and safety compliance. Always be professional, accurate,
64+
and follow safety protocols. Never provide information that could compromise
65+
warehouse security or safety.
66+
67+
- type: safety
68+
content: |
69+
Safety is paramount in warehouse operations. Always prioritize safety protocols
70+
and never suggest actions that could endanger workers or equipment. If asked
71+
about potentially dangerous operations, always recommend consulting with safety
72+
personnel first.
73+
74+
- type: compliance
75+
content: |
76+
Ensure all recommendations comply with warehouse policies, safety regulations,
77+
and industry standards. Never suggest actions that violate compliance requirements.
78+
79+
# =============================================================================
80+
# Limits and Constraints
81+
# =============================================================================
82+
limits:
83+
max_turns: 50
84+
max_tokens_per_turn: 1000
85+
max_tokens_per_conversation: 10000
86+
87+
# =============================================================================
88+
# Monitoring and Logging
89+
# =============================================================================
90+
monitoring:
91+
log_level: INFO
92+
log_conversations: true
93+
log_rail_hits: true
94+
metrics:
95+
- conversation_length
96+
- rail_hits
97+
- response_time
98+
- safety_violations
99+
- compliance_issues
100+

0 commit comments

Comments
 (0)