Agentic retrieval #18
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This merge request introduces a comprehensive agentic retrieval system for extracting and analyzing data policies from documents. The system leverages multi-agent AI architecture to enhance document processing capabilities within the RAG system.
New Features
Agent Orchestrator (agent_orchestrator.py) - Coordinates multiple AI agents for complex document analysis
Enhanced Main Handler (enhanced_main_handler.py) - Improved document processing pipeline
Main Handler (main_handler.py) - Core document extraction logic
AI-Powered Document Analysis
Agent Prompts (agent_prompts.py) - Specialized prompts for different analysis tasks
Text Analyzer (text_analyzer.py) - Advanced text processing capabilities
Test Agents (test_agents.py) - Comprehensive testing framework for AI agents
Data Infrastructure
Database Client (database_client.py) - Database connectivity and operations
Qdrant Client (qdrant_client.py) - Vector database integration for semantic search
Utils (utils.py) - Helper functions and utilities
Document Processing Pipeline
Main Pipeline (main.py) - Orchestrates the entire extraction workflow
Example Usage (example_usage.py) - Demonstrates system capabilities
Requirements (requirements.txt) - Python dependencies
Data Transformation Tools
Matrix Transformation - Tools for converting extracted data into structured matrices
Test Import - Validation utilities for data imports
Technical Improvements
Dependency Management: Updated poetry.lock and pyproject.toml for new AI dependencies
Taxonomy Updates: Enhanced themes taxonomy for better categorization
Template Files: Added sample PDF documents for testing and validation
Documentation
README_MULTI_AGENT.md - Comprehensive guide for the multi-agent system
Code Examples - Practical usage demonstrations
Template Files - Sample documents for testing
Testing
Test Agents: Comprehensive testing framework for AI components
Test Files: Sample PDFs and validation scripts
Import Testing: Data validation and integrity checks
Impact
Enhanced Document Processing: More intelligent and accurate policy extraction
Scalable Architecture: Multi-agent system that can handle complex document analysis
Better Data Quality: Improved accuracy through AI-powered analysis
Future-Proof: Extensible architecture for additional AI capabilities
Ready for Review
All new files have been added and tested
Dependencies are properly managed
Documentation is comprehensive
Test framework is in place
Next Steps
After merge:
Deploy and test the new agentic system
Monitor performance and accuracy metrics
Gather user feedback for iterative improvements
Plan additional AI agent capabilities