@@ -496,234 +496,85 @@ The system implements **NVIDIA NeMo Guardrails** for content safety, security, a
496496
497497### Overview
498498
499- NeMo Guardrails provides multi-layer protection for the warehouse operational assistant:
500-
501- - ** API Integration** - Uses NVIDIA NeMo Guardrails API for intelligent safety validation
502- - ** Input Safety Validation** - Checks user queries before processing
503- - ** Output Safety Validation** - Validates AI responses before returning to users
504- - ** Pattern-Based Fallback** - Falls back to keyword/phrase matching if API is unavailable
505- - ** Timeout Protection** - Prevents hanging requests with configurable timeouts
506- - ** Graceful Degradation** - Continues operation even if guardrails fail
499+ The guardrails system provides ** dual implementation support** with automatic fallback:
500+
501+ - ** NeMo Guardrails SDK** (with Colang) - Intelligent, programmable guardrails using NVIDIA's official SDK
502+ - ✅ ** Already included** in ` requirements.txt ` (` nemoguardrails>=0.19.0 ` )
503+ - Installed automatically when you run ` pip install -r requirements.txt `
504+ - ** Pattern-Based Matching** - Fast, lightweight fallback using keyword/phrase matching
505+ - ** Feature Flag Control** - Runtime switching between implementations via ` USE_NEMO_GUARDRAILS_SDK `
506+ - ** Automatic Fallback** - Seamlessly switches to pattern-based if SDK unavailable
507+ - ** Input & Output Validation** - Checks both user queries and AI responses
508+ - ** Timeout Protection** - Prevents hanging requests (3s input, 5s output)
509+ - ** Comprehensive Monitoring** - Metrics tracking for method usage and performance
507510
508511### Protection Categories
509512
510- The guardrails system protects against:
511-
512- #### 1. Jailbreak Attempts
513- Detects attempts to override system instructions:
514- - "ignore previous instructions"
515- - "forget everything"
516- - "pretend to be"
517- - "roleplay as"
518- - "bypass"
519- - "jailbreak"
520-
521- #### 2. Safety Violations
522- Prevents guidance that could endanger workers or equipment:
523- - Operating equipment without training
524- - Bypassing safety protocols
525- - Working without personal protective equipment (PPE)
526- - Unsafe equipment operation
527-
528- #### 3. Security Violations
529- Blocks requests for sensitive security information:
530- - Security codes and access codes
531- - Restricted area access
532- - Alarm codes
533- - System bypass instructions
534-
535- #### 4. Compliance Violations
536- Ensures adherence to regulations and policies:
537- - Avoiding safety inspections
538- - Skipping compliance requirements
539- - Ignoring regulations
540- - Working around safety rules
541-
542- #### 5. Off-Topic Queries
543- Redirects non-warehouse related queries:
544- - Weather, jokes, cooking recipes
545- - Sports, politics, entertainment
546- - General knowledge questions
547-
548- ### Configuration
549-
550- #### Environment Variables
551-
552- The guardrails service can be configured via environment variables:
513+ The guardrails system protects against ** 88 patterns** across 5 categories:
514+
515+ 1 . ** Jailbreak Attempts** (17 patterns) - Prevents instruction override attempts
516+ 2 . ** Safety Violations** (13 patterns) - Blocks unsafe operational guidance
517+ 3 . ** Security Violations** (15 patterns) - Prevents security information requests
518+ 4 . ** Compliance Violations** (12 patterns) - Ensures regulatory adherence
519+ 5 . ** Off-Topic Queries** (13 patterns) - Redirects non-warehouse queries
520+
521+ ### Quick Configuration
553522
554523``` bash
555- # NeMo Guardrails API Configuration
556- # Use RAIL_API_KEY for guardrails-specific key, or it will fall back to NVIDIA_API_KEY
557- RAIL_API_KEY=your-nvidia-api-key-here
524+ # Enable SDK implementation (recommended)
525+ USE_NEMO_GUARDRAILS_SDK=true
558526
559- # Guardrails API endpoint (defaults to NVIDIA's cloud endpoint )
560- RAIL_API_URL=https://integrate. api.nvidia.com/v1
527+ # NVIDIA API key (required for SDK )
528+ NVIDIA_API_KEY=your- api-key-here
561529
562- # Timeout for guardrails API calls in seconds (default: 10)
530+ # Optional: Guardrails-specific configuration
531+ RAIL_API_KEY=your-api-key-here # Falls back to NVIDIA_API_KEY if not set
532+ RAIL_API_URL=https://integrate.api.nvidia.com/v1
563533GUARDRAILS_TIMEOUT=10
564-
565- # Enable/disable API usage (default: true)
566- # If false, will only use pattern-based matching
567534GUARDRAILS_USE_API=true
568535```
569536
570- ** Note:** If ` RAIL_API_KEY ` is not set, the service will use ` NVIDIA_API_KEY ` as a fallback. If neither is set, the service will use pattern-based matching only.
571-
572- #### YAML Configuration
573-
574- Guardrails configuration is also defined in ` data/config/guardrails/rails.yaml ` :
575-
576- ``` yaml
577- # Safety and compliance rules
578- safety_rules :
579- - name : " jailbreak_detection"
580- patterns :
581- - " ignore previous instructions"
582- - " forget everything"
583- # ... more patterns
584- response : " I cannot ignore my instructions..."
585-
586- - name : " safety_violations"
587- patterns :
588- - " operate forklift without training"
589- - " bypass safety protocols"
590- # ... more patterns
591- response : " Safety is our top priority..."
592- ` ` `
593-
594- **Configuration Features:**
595- - Pattern-based rule definitions
596- - Custom response messages for each violation type
597- - Monitoring and logging configuration
598- - Conversation limits and constraints
599-
600537### Integration
601538
602- Guardrails are integrated into the chat endpoint at two critical points:
603-
604- 1. **Input Safety Check** (before processing):
605- ` ` ` python
606- input_safety = await guardrails_service.check_input_safety(req.message)
607- if not input_safety.is_safe :
608- return safety_response
609- ` ` `
610-
611- 2. **Output Safety Check** (after AI response):
612- ` ` ` python
613- output_safety = await guardrails_service.check_output_safety(ai_response)
614- if not output_safety.is_safe :
615- return safety_response
616- ` ` `
617-
618- **Timeout Protection:**
619- - Input check: 3-second timeout
620- - Output check: 5-second timeout
621- - Graceful degradation on timeout
539+ Guardrails are automatically integrated into the chat endpoint:
540+ - ** Input Safety Check** - Validates user queries before processing (3s timeout)
541+ - ** Output Safety Check** - Validates AI responses before returning (5s timeout)
542+ - ** Metrics Tracking** - Logs method used, performance, and safety status
622543
623544### Testing
624545
625- Comprehensive test suite available in ` tests/unit/test_guardrails.py`:
626-
627546``` bash
628- # Run guardrails tests
629- python tests/unit/test_guardrails.py
630- ` ` `
631-
632- **Test Coverage:**
633- - 18 test scenarios covering all violation categories
634- - Legitimate query validation
635- - Performance testing with concurrent requests
636- - Response time measurement
637-
638- **Test Categories:**
639- - Jailbreak attempts (2 tests)
640- - Safety violations (3 tests)
641- - Security violations (3 tests)
642- - Compliance violations (2 tests)
643- - Off-topic queries (3 tests)
644- - Legitimate warehouse queries (4 tests)
645-
646- # ## Service Implementation
647-
648- The guardrails service (`src/api/services/guardrails/guardrails_service.py`) provides :
649-
650- - **GuardrailsService** class with async methods
651- - **API Integration** - Calls NVIDIA NeMo Guardrails API for intelligent validation
652- - **Pattern-based Fallback** - Falls back to keyword/phrase matching if API unavailable
653- - **Safety response generation** based on violation types
654- - **Configuration loading** from YAML files
655- - **Error handling** with graceful degradation
656- - **Automatic fallback** - Seamlessly switches to pattern matching on API failures
657-
658- # ## Response Format
659-
660- When a violation is detected, the system returns :
661-
662- ` ` ` json
663- {
664- "reply": "Safety is our top priority. I cannot provide guidance...",
665- "route": "guardrails",
666- "intent": "safety_violation",
667- "context": {
668- "safety_violations": ["Safety violation: 'operate forklift without training'"]
669- },
670- "confidence": 0.9
671- }
672- ` ` `
673-
674- # ## Monitoring
675-
676- Guardrails activity is logged and monitored :
677-
678- - **Log Level**: INFO
679- - **Conversation Logging**: Enabled
680- - **Rail Hits Logging**: Enabled
681- - **Metrics Tracked**:
682- - Conversation length
683- - Rail hits (violations detected)
684- - Response time
685- - Safety violations
686- - Compliance issues
687-
688- # ## Best Practices
547+ # Unit tests
548+ pytest tests/unit/test_guardrails_sdk.py -v
689549
690- 1. **Regular Updates** : Review and update patterns in `rails.yaml` based on new threats
691- 2. **Monitoring** : Monitor guardrails logs for patterns and trends
692- 3. **Testing** : Run test suite after configuration changes
693- 4. **Customization** : Adjust timeout values based on your infrastructure
694- 5. **Response Messages** : Keep safety responses professional and helpful
550+ # Integration tests (compares both implementations)
551+ pytest tests/integration/test_guardrails_comparison.py -v -s
695552
696- # ## API Integration Details
697-
698- The guardrails service now integrates with the NVIDIA NeMo Guardrails API :
699-
700- 1. **Primary Method** : API-based validation using NVIDIA's guardrails endpoint
701- - Uses `/chat/completions` endpoint with safety-focused prompts
702- - Leverages LLM-based violation detection for more intelligent analysis
703- - Returns structured JSON with violation details and confidence scores
704-
705- 2. **Fallback Method** : Pattern-based matching
706- - Automatically used if API is unavailable or times out
707- - Uses keyword/phrase matching for common violation patterns
708- - Ensures system continues to function even without API access
709-
710- 3. **Hybrid Approach** : Best of both worlds
711- - API provides intelligent, context-aware validation
712- - Pattern matching ensures reliability and low latency fallback
713- - Seamless switching between methods based on availability
714-
715- # ## Future Enhancements
553+ # Performance benchmarks
554+ pytest tests/integration/test_guardrails_comparison.py::test_performance_benchmark -v -s
555+ ```
716556
717- Planned improvements :
718- - Enhanced API integration with dedicated guardrails endpoints
719- - Machine learning for adaptive threat detection
720- - Enhanced monitoring dashboards
721- - Custom guardrails rules via API configuration
557+ ### Documentation
722558
723- **Related Documentation:**
724- - Configuration file : ` data/config/guardrails/rails.yaml`
725- - Service implementation : ` src/api/services/guardrails/guardrails_service.py`
726- - Test suite : ` tests/unit/test_guardrails.py`
559+ ** 📖 For comprehensive documentation, see: [ Guardrails Implementation Guide] ( docs/architecture/guardrails-implementation.md ) **
560+
561+ The detailed guide includes:
562+ - Complete architecture overview
563+ - Implementation details (SDK vs Pattern-based)
564+ - All 88 guardrails patterns
565+ - API interface documentation
566+ - Configuration reference
567+ - Monitoring & metrics
568+ - Testing instructions
569+ - Troubleshooting guide
570+ - Future roadmap
571+
572+ ** Key Files:**
573+ - Service: ` src/api/services/guardrails/guardrails_service.py `
574+ - SDK Wrapper: ` src/api/services/guardrails/nemo_sdk_service.py `
575+ - Colang Config: ` data/config/guardrails/rails.co `
576+ - NeMo Config: ` data/config/guardrails/config.yml `
577+ - Legacy YAML: ` data/config/guardrails/rails.yaml `
727578
728579## Development Guide
729580
0 commit comments