A sophisticated conversational agent that answers domain questions about AI policy using web search, local RAG (Retrieval-Augmented Generation), and mathematical tools with comprehensive source citations.
- π§ Multi-Modal Intelligence: Combines web search, local knowledge base, and mathematical calculations
- π― Smart Routing: Automatically routes queries to the most appropriate tool based on intent
- π¬ Conversational Memory: Maintains context across conversations with session management
- π Source Citations: Provides detailed source attributions for all responses
- π Comprehensive Evaluation: Built-in evaluation framework for faithfulness and groundedness
- π¨ Multiple Interfaces: FastAPI backend, Streamlit UI, and Gradio interface
- ποΈ Scalable Architecture: Modular design using LangChain components
- Python 3.8+
- OpenAI API key
- (Optional) SerpAPI or Tavily API key for web search
- PDF documents for your domain knowledge base
- Clone the repository:
git clone https://github.com/yourusername/context-aware-research-chatbot.git
cd context-aware-research-chatbot- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env with your API keys- Initialize the project:
python main.py setup- Add your PDF documents:
# Place your PDF files in data/pdfs/
cp your-documents/*.pdf data/pdfs/- Process documents:
python main.py process-pdfs- Test the system:
python main.py teststreamlit run simple_demo.py --server.port 8501Access at: http://localhost:8501
# Terminal 1: Start API
python main.py start-api
# Terminal 2: Start UI
python main.py start-uipython gradio_ui.pyAccess at: http://localhost:7860
Try these questions with your AI policy dataset:
- Policy Questions: "What are the key AI safety guidelines?"
- Regulatory: "How does GDPR apply to AI systems?"
- Ethics: "What are the ethical considerations for AI deployment?"
- Math: "Calculate 15% of 250,000"
- Complex: "How do AI policy frameworks address bias in algorithmic decision-making?"
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Streamlit UI β β Gradio UI β β FastAPI β
βββββββββββ¬ββββββββ βββββββββββ¬ββββββββ βββββββββββ¬ββββββββ
β β β
ββββββββββββββββββββββββΌβββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββ
β Context-Aware β
β Research Chatbot β
β βββββββββββββββββββββββ β
β β Query Router β β
β βββββββββββ¬ββββββββββββ β
β βββββββββββΌββββββββββββ β
β β Tools β β
β β βββββββ βββββββ β β
β β β RAG β β Web β β β
β β βββββββ βββββββ β β
β β βββββββ β β
β β βMath β β β
β β βββββββ β β
β βββββββββββββββββββββββ β
β βββββββββββββββββββββββ β
β β Memory Manager β β
β βββββββββββββββββββββββ β
ββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββ
β Data Layer β
β βββββββ βββββββ βββββββ β
β βFAISSβ βSQLiteβ βPDFs β β
β βββββββ βββββββ βββββββ β
ββββββββββββββββββββββββββββ
context-aware-research-chatbot/
βββ README.md
βββ requirements.txt
βββ .env.example
βββ .gitignore
βββ main.py # Main CLI interface
βββ config.py # Configuration management
βββ simple_demo.py # Simplified Streamlit demo
βββ data_processor.py # PDF processing & vector store
βββ tools.py # Web search, math, RAG tools
βββ chatbot.py # Core chatbot logic
βββ database.py # Database models & management
βββ api.py # FastAPI backend
βββ streamlit_ui.py # Streamlit frontend
βββ gradio_ui.py # Gradio frontend
βββ evaluation.py # Evaluation framework
βββ data/ # Data directory
βββ pdfs/ # Place your PDF files here
βββ vector_store/ # Generated vector store
βββ eval_dataset.json # Evaluation dataset
Key configuration options in .env:
# Required
OPENAI_API_KEY=your_key_here
# Optional - for web search
SERPAPI_API_KEY=your_serpapi_key
TAVILY_API_KEY=your_tavily_key
# Model settings
LLM_MODEL=gpt-3.5-turbo
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
# Data settings
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
TOP_K_RETRIEVAL=5Run comprehensive evaluation:
python main.py evalThe system evaluates responses on:
- Faithfulness: Accuracy to source material
- Relevance: Response relevance to questions
- Tool Routing: Correct tool selection
- Source Quality: Citation accuracy
Run the test suite:
python main.py testimport requests
# Create session
response = requests.post("http://localhost:8000/sessions",
json={"user_id": "your_user_id"})
session_id = response.json()["session_id"]
# Chat
response = requests.post("http://localhost:8000/chat", json={
"message": "What are the latest AI safety guidelines?",
"session_id": session_id
})
result = response.json()
print(f"Response: {result['response']}")
print(f"Tool used: {result['tool_used']}")
print(f"Sources: {result['sources']}")- Create tool class in
tools.py - Update router logic
- Integrate in
chatbot.py
Add evaluators in evaluation.py for domain-specific metrics.
Track conversation statistics, tool usage patterns, and performance metrics through the built-in monitoring system.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Run the test suite
- Submit a pull request
This project is licensed under the MIT License.
- Built with LangChain
- UI powered by Streamlit and Gradio
- Vector storage with FAISS
- Backend with FastAPI
Happy Research! π€π