AI-Powered Deep Research Assistant System
An automated research tool based on Large Language Models (LLM) that can automatically collect, analyze, and synthesize information to generate structured research reports.
AI2Apps-DeepResearch is an intelligent research assistant system that integrates Large Language Models (LLM) and web search technologies to achieve a fully automated research process from question analysis to report generation. The system can:
- 🤖 Intelligent Analysis: Automatically decompose research questions and generate multi-angle search strategies
- 🔍 Deep Search: Execute concurrent web searches to collect relevant information
- 📝 Content Extraction: Clean web content using Trafilatura to extract high-quality text
- 🧠 Smart Synthesis: Analyze and synthesize information through LLM to generate structured reports
- 📚 Citation Management: Automatically track information sources and generate standard references
- 🔄 Recursive Research: Support multi-level question derivation for in-depth exploration of information gaps
-
Multi-Level Search Strategy
- Automatically decompose research questions into multiple search dimensions
- Generate targeted search phrases
- Support customizable search depth and result count
-
Concurrent Content Processing
- Multi-threaded concurrent search and content extraction
- Intelligent caching mechanism to avoid redundant processing
- Automatic retry mechanism for improved stability
-
Intelligent Content Filtering
- LLM-based content relevance evaluation
- Automatic filtering of irrelevant information
- Extract key entities and factual background
-
Derived Question Processing
- Automatically identify information gaps
- Generate and research derived questions
- Support configurable recursion depth
-
Report Generation
- Structured reports in Markdown format
- Streaming output with real-time progress
- Automatically generate reference lists
- High Performance: Concurrent processing for improved research efficiency
- Extensible: Support recursive derived research with configurable depth
- Reliable: Comprehensive error handling and retry mechanisms
- User-Friendly: Command-line interface with parameter and interactive input support
- Python 3.11+
- Conda (recommended) or Python virtual environment
- Clone the repository
git clone https://github.com/Lyt060814/AI2Apps-DeepResearch.git
cd AI2Apps-DeepResearch- Create virtual environment (Conda recommended)
conda create -n deep-research python=3.11
conda activate deep-research- Install dependencies
pip install -r requirements.txtOr use Tsinghua mirror for faster installation (China users):
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple- Configure environment variables
Copy .env.example to .env and fill in your API keys:
cp .env.example .envEdit the .env file:
OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1 # Optional: custom OpenAI API endpoint- OpenAI API: Visit OpenAI Platform to get your API key
- Tavily API: Visit Tavily to register and get your API key
Method 1: Command-line argument
python run.py -q "Your research question"Example:
python run.py -q "What are the current applications and development trends of AI in healthcare?"Method 2: Interactive input
python run.pyThen enter your research question when prompted.
Configure research parameters
Add configuration to your .env file to customize research behavior:
# LLM Configuration
RESEARCH_LLM_MODEL=gpt-4.1-mini # LLM model to use
RESEARCH_LLM_TEMPERATURE=0 # Temperature parameter (0-1)
# Search Configuration
RESEARCH_SEARCH_MAX_RESULTS=5 # Maximum results per search
RESEARCH_SEARCH_DEPTH=basic # Search depth: basic/advanced
# Research Configuration
RESEARCH_MAX_DERIVATION_DEPTH=1 # Maximum depth for derived questions
RESEARCH_REPORT_PATH=/tmp/result.md # Report output pathUse DeepSearchAgent (Simplified Version)
Edit run.py and replace DeepResearchAgent with DeepSearchAgent:
from src.aalgorithm.agents import DeepSearchAgent
agent = DeepSearchAgent()
report = agent.run(question)AI2Apps-DeepResearch/
├── run.py # Main entry point
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── .gitignore # Git ignore file
├── setup_agent.js # Project installation config (optional)
├── scripts/ # Scripts directory
│ └── trafilatura_api.py # Trafilatura API server
└── src/ # Source code directory
└── aalgorithm/ # Core algorithm package
├── __init__.py # Package initialization
├── utils.py # Utility functions
├── llm.py # LLM provider
├── search.py # Search provider
├── content.py # Content cleaning and document management
└── agents/ # Agents directory
├── __init__.py # Agents package initialization
├── deepsearch.py # Simplified search agent
└── deepresearch.py # Full-featured research agent
| Variable | Description | Default | Required |
|---|---|---|---|
OPENAI_API_KEY |
OpenAI API key | - | ✅ |
TAVILY_API_KEY |
Tavily search API key | - | ✅ |
OPENAI_BASE_URL |
OpenAI API base URL | https://api.openai.com/v1 |
❌ |
RESEARCH_LLM_MODEL |
LLM model name | gpt-4.1-mini |
❌ |
RESEARCH_LLM_TEMPERATURE |
Generation temperature | 0 |
❌ |
RESEARCH_SEARCH_MAX_RESULTS |
Max results per search | 5 |
❌ |
RESEARCH_SEARCH_DEPTH |
Search depth | basic |
❌ |
RESEARCH_MAX_DERIVATION_DEPTH |
Max derivation depth | 1 |
❌ |
RESEARCH_REPORT_PATH |
Report output path | /tmp/result.md |
❌ |
DeepResearchAgent (Full Version)
- Best for: Deep research and comprehensive analysis scenarios
- Features: Supports derived questions, citation management, multi-level analysis
- Time: Longer, but more detailed results
DeepSearchAgent (Simplified Version)
- Best for: Quick information overview scenarios
- Features: Simplified workflow, faster generation
- Time: Shorter, suitable for quick queries
1. Question Analysis
↓
2. Generate Search Strategy (Multi-angle Decomposition)
↓
3. Concurrent Search & Data Collection
↓
4. Content Relevance Filtering (LLM Evaluation)
↓
5. Content Cleaning & Extraction (Trafilatura)
↓
6. Information Synthesis & Analysis (LLM)
↓
7. Generate Base Answers
↓
8. Identify & Process Derived Questions (Recursive)
↓
9. Generate Final Report (Markdown)
↓
10. Add References
- LLM Integration: Uses OpenAI API for question analysis, content evaluation, and report generation
- Web Search: Integrates Tavily API for high-quality web searches
- Content Extraction: Uses Trafilatura library to extract clean web text
- Concurrent Processing: Uses Python's
concurrent.futuresfor multi-threaded concurrency - Citation Tracking: Automatically manages information sources and generates standard citation format
python run.py -q "What are the latest advances and commercial applications of quantum computing in 2024?"python run.py -q "What are the features of a company's newly released AI product? How are competitors responding?"python run.py -q "What are the current solutions to the hallucination problem in large language models?"This project is licensed under the MIT License - see the LICENSE file for details
This project uses the following excellent open-source projects and services:
- OpenAI API - Large Language Model services
- Tavily - Intelligent search API
- Trafilatura - Web content extraction
- Loguru - Logging management