AI2Apps-DeepResearch

AI-Powered Deep Research Assistant System

An automated research tool based on Large Language Models (LLM) that can automatically collect, analyze, and synthesize information to generate structured research reports.

Features • Quick Start • Usage • Configuration

English | 简体中文

📖 Introduction

AI2Apps-DeepResearch is an intelligent research assistant system that integrates Large Language Models (LLM) and web search technologies to achieve a fully automated research process from question analysis to report generation. The system can:

🤖 Intelligent Analysis: Automatically decompose research questions and generate multi-angle search strategies
🔍 Deep Search: Execute concurrent web searches to collect relevant information
📝 Content Extraction: Clean web content using Trafilatura to extract high-quality text
🧠 Smart Synthesis: Analyze and synthesize information through LLM to generate structured reports
📚 Citation Management: Automatically track information sources and generate standard references
🔄 Recursive Research: Support multi-level question derivation for in-depth exploration of information gaps

✨ Features

Core Capabilities

Multi-Level Search Strategy
- Automatically decompose research questions into multiple search dimensions
- Generate targeted search phrases
- Support customizable search depth and result count
Concurrent Content Processing
- Multi-threaded concurrent search and content extraction
- Intelligent caching mechanism to avoid redundant processing
- Automatic retry mechanism for improved stability
Intelligent Content Filtering
- LLM-based content relevance evaluation
- Automatic filtering of irrelevant information
- Extract key entities and factual background
Derived Question Processing
- Automatically identify information gaps
- Generate and research derived questions
- Support configurable recursion depth
Report Generation
- Structured reports in Markdown format
- Streaming output with real-time progress
- Automatically generate reference lists

Technical Highlights

High Performance: Concurrent processing for improved research efficiency
Extensible: Support recursive derived research with configurable depth
Reliable: Comprehensive error handling and retry mechanisms
User-Friendly: Command-line interface with parameter and interactive input support

🚀 Quick Start

Requirements

Python 3.11+
Conda (recommended) or Python virtual environment

Installation

Clone the repository

git clone https://github.com/Lyt060814/AI2Apps-DeepResearch.git
cd AI2Apps-DeepResearch

Create virtual environment (Conda recommended)

conda create -n deep-research python=3.11
conda activate deep-research

Install dependencies

pip install -r requirements.txt

Or use Tsinghua mirror for faster installation (China users):

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Configure environment variables

Copy .env.example to .env and fill in your API keys:

cp .env.example .env

Edit the .env file:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional: custom OpenAI API endpoint

Get API Keys

OpenAI API: Visit OpenAI Platform to get your API key
Tavily API: Visit Tavily to register and get your API key

💡 Usage

Basic Usage

Method 1: Command-line argument

python run.py -q "Your research question"

Example:

python run.py -q "What are the current applications and development trends of AI in healthcare?"

Method 2: Interactive input

python run.py

Then enter your research question when prompted.

Advanced Usage

Configure research parameters

Add configuration to your .env file to customize research behavior:

# LLM Configuration
RESEARCH_LLM_MODEL=gpt-4.1-mini           # LLM model to use
RESEARCH_LLM_TEMPERATURE=0                # Temperature parameter (0-1)

# Search Configuration
RESEARCH_SEARCH_MAX_RESULTS=5             # Maximum results per search
RESEARCH_SEARCH_DEPTH=basic               # Search depth: basic/advanced

# Research Configuration
RESEARCH_MAX_DERIVATION_DEPTH=1           # Maximum depth for derived questions
RESEARCH_REPORT_PATH=/tmp/result.md       # Report output path

Use DeepSearchAgent (Simplified Version)

Edit run.py and replace DeepResearchAgent with DeepSearchAgent:

from src.aalgorithm.agents import DeepSearchAgent

agent = DeepSearchAgent()
report = agent.run(question)

📁 Project Structure

AI2Apps-DeepResearch/
├── run.py                          # Main entry point
├── README.md                       # Project documentation
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment variables template
├── .gitignore                      # Git ignore file
├── setup_agent.js                  # Project installation config (optional)
├── scripts/                        # Scripts directory
│   └── trafilatura_api.py         # Trafilatura API server
└── src/                           # Source code directory
    └── aalgorithm/                # Core algorithm package
        ├── __init__.py            # Package initialization
        ├── utils.py               # Utility functions
        ├── llm.py                 # LLM provider
        ├── search.py              # Search provider
        ├── content.py             # Content cleaning and document management
        └── agents/                # Agents directory
            ├── __init__.py        # Agents package initialization
            ├── deepsearch.py      # Simplified search agent
            └── deepresearch.py    # Full-featured research agent

🔧 Configuration

Environment Variables

Variable	Description	Default	Required
`OPENAI_API_KEY`	OpenAI API key	-	✅
`TAVILY_API_KEY`	Tavily search API key	-	✅
`OPENAI_BASE_URL`	OpenAI API base URL	`https://api.openai.com/v1`	❌
`RESEARCH_LLM_MODEL`	LLM model name	`gpt-4.1-mini`	❌
`RESEARCH_LLM_TEMPERATURE`	Generation temperature	`0`	❌
`RESEARCH_SEARCH_MAX_RESULTS`	Max results per search	`5`	❌
`RESEARCH_SEARCH_DEPTH`	Search depth	`basic`	❌
`RESEARCH_MAX_DERIVATION_DEPTH`	Max derivation depth	`1`	❌
`RESEARCH_REPORT_PATH`	Report output path	`/tmp/result.md`	❌

Agent Selection

DeepResearchAgent (Full Version)

Best for: Deep research and comprehensive analysis scenarios
Features: Supports derived questions, citation management, multi-level analysis
Time: Longer, but more detailed results

DeepSearchAgent (Simplified Version)

Best for: Quick information overview scenarios
Features: Simplified workflow, faster generation
Time: Shorter, suitable for quick queries

🔍 How It Works

DeepResearchAgent Workflow

1. Question Analysis
   ↓
2. Generate Search Strategy (Multi-angle Decomposition)
   ↓
3. Concurrent Search & Data Collection
   ↓
4. Content Relevance Filtering (LLM Evaluation)
   ↓
5. Content Cleaning & Extraction (Trafilatura)
   ↓
6. Information Synthesis & Analysis (LLM)
   ↓
7. Generate Base Answers
   ↓
8. Identify & Process Derived Questions (Recursive)
   ↓
9. Generate Final Report (Markdown)
   ↓
10. Add References

Core Technologies

LLM Integration: Uses OpenAI API for question analysis, content evaluation, and report generation
Web Search: Integrates Tavily API for high-quality web searches
Content Extraction: Uses Trafilatura library to extract clean web text
Concurrent Processing: Uses Python's concurrent.futures for multi-threaded concurrency
Citation Tracking: Automatically manages information sources and generates standard citation format

📊 Examples

Example 1: Technology Trend Research

python run.py -q "What are the latest advances and commercial applications of quantum computing in 2024?"

Example 2: Event Background Analysis

python run.py -q "What are the features of a company's newly released AI product? How are competitors responding?"

Example 3: Academic Research

python run.py -q "What are the current solutions to the hallucination problem in large language models?"

📄 License

This project is licensed under the MIT License - see the LICENSE file for details

🙏 Acknowledgments

This project uses the following excellent open-source projects and services:

OpenAI API - Large Language Model services
Tavily - Intelligent search API
Trafilatura - Web content extraction
Loguru - Logging management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI2Apps-DeepResearch

📖 Introduction

✨ Features

Core Capabilities

Technical Highlights

🚀 Quick Start

Requirements

Installation

Get API Keys

💡 Usage

Basic Usage

Advanced Usage

📁 Project Structure

🔧 Configuration

Environment Variables

Agent Selection

🔍 How It Works

DeepResearchAgent Workflow

Core Technologies

📊 Examples

Example 1: Technology Trend Research

Example 2: Event Background Analysis

Example 3: Academic Research

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
src/aalgorithm		src/aalgorithm
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

AI2Apps-DeepResearch

📖 Introduction

✨ Features

Core Capabilities

Technical Highlights

🚀 Quick Start

Requirements

Installation

Get API Keys

💡 Usage

Basic Usage

Advanced Usage

📁 Project Structure

🔧 Configuration

Environment Variables

Agent Selection

🔍 How It Works

DeepResearchAgent Workflow

Core Technologies

📊 Examples

Example 1: Technology Trend Research

Example 2: Event Background Analysis

Example 3: Academic Research

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages