An advanced Retrieval-Augmented Generation (RAG) system that enhances local document querying with real-time web search capabilities. This application leverages a multi-agent team built with CrewAI to provide comprehensive answers by searching both a user-uploaded PDF and the web.
- 📄 PDF Knowledge Base: Upload a PDF to create a dynamic, searchable knowledge base.
- 🌐 Hybrid Search: Combines semantic search on your local document with real-time web search using Exa.
- 🤖 Multi-Agent System: Utilizes a CrewAI team of specialized agents for database search, web search, and answer generation.
- ⚡ Vector Storage: Powered by Qdrant for efficient vector storage and similarity search.
- 💬 Conversational Interface: An intuitive chat interface built with Streamlit.
- 🔬 AI Observability: Integrated with AgentOps for tracing and monitoring agent performance.
The system uses a sequential CrewAI process:
- PDF Processing: A user-uploaded PDF is processed by
pdfplumber, converted into embeddings using OpenAI, and stored in a Qdrant vector database. - DB Search Agent: This agent first queries the Qdrant database to find context relevant to the user's query from the uploaded document.
- Web Search Agent: Next, an agent uses the EXA Search tool to gather up-to-date, relevant information from the web.
- Answer Agent: Finally, a master agent synthesizes the information from both the PDF context and the web search results to generate a comprehensive, well-formatted answer.
┌────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ PDF Upload │──▶│ OpenAI Embeddings │───▶│ Qdrant VectorDB│
└────────────────┘ └──────────────────┘ └─────────────────┘
│ │
│ ▼
┌────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ User Query │──▶│ CrewAI │◀───│ DB Search Agent │
└────────────────┘ │ (Sequential Flow)│ └─────────────────┘
│ └──────────────────┘ │
│ │ ▼
│ │ ┌─────────────────┐
│ └────────────▶│ Web Search Agent│
│ │ (Exa Tool) │
│ └─────────────────┘
│ │
▼ ▼
┌────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ RAG Response │◀──│ Answer Agent │◀───│ Combined Context │
└────────────────┘ └──────────────────┘ └─────────────────┘
- Python 3.11+
- OpenAI API Key
- Qdrant API Key & URL
- Exa API Key
- AgentOps API Key (Optional, for observability)
-
Clone the repository:
git clone https://github.com/Arindam200/awesome-ai-apps.git cd rag_apps/agentic_rag_with_web_search -
Install dependencies: This project uses
uvfor package management.pip install uv uv sync
-
Set up environment variables: Create a
.envfile in the project directory and add your API keys:OPENAI_API_KEY="your_openai_api_key" QDRANT_API_KEY="your_qdrant_api_key" QDRANT_URL="your_qdrant_cluster_url" EXA_API_KEY="your_exa_api_key" AGENTOPS_API_KEY="your_agentops_api_key"
-
Run the application:
streamlit run main.py
- Enter API Keys: Fill in your Qdrant and Exa API keys in the sidebar.
- Upload a PDF: Use the file uploader in the sidebar to select a PDF. The application will automatically process it and load it into your Qdrant collection.
- Ask a Question: Once the PDF is loaded, use the chat input to ask a question.
- Get an Answer: The agent crew will start its process. The final, synthesized answer, combining knowledge from the PDF and the web, will be displayed in the chat.
The core logic is defined in crews.py and qdrant_tool.py.
db_search_agent: Searches the Qdrant vector database.search_agent: Searches the web usingEXASearchTool.answer_agent: Compiles the final response.- The
Crewis configured to run these agents in a sequential process.
- PDF Extraction: Uses
pdfplumberto extract text. - Embeddings: Generates embeddings using OpenAI's
text-embedding-3-largemodel. - Vector Store: Creates a collection in Qdrant and upserts the document vectors. The collection size is configured for
3072dimensions.
- CrewAI: Multi-agent framework for orchestrating the RAG workflow.
- Streamlit: Web interface for the chat application.
- Qdrant: Vector database for storing and searching PDF embeddings.
- Exa: AI-powered search engine for real-time web queries.
- OpenAI: For generating embeddings and powering the agents.
- AgentOps: For monitoring and tracing the agent execution flow.
- PDFPlumber: For robust text extraction from PDF files.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
This project is licensed under the MIT License - see the LICENSE file for details.