Skip to content

Lyt060814/AI2Apps-DeepResearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI2Apps-DeepResearch

AI-Powered Deep Research Assistant System

An automated research tool based on Large Language Models (LLM) that can automatically collect, analyze, and synthesize information to generate structured research reports.

FeaturesQuick StartUsageConfiguration

English | 简体中文


📖 Introduction

AI2Apps-DeepResearch is an intelligent research assistant system that integrates Large Language Models (LLM) and web search technologies to achieve a fully automated research process from question analysis to report generation. The system can:

  • 🤖 Intelligent Analysis: Automatically decompose research questions and generate multi-angle search strategies
  • 🔍 Deep Search: Execute concurrent web searches to collect relevant information
  • 📝 Content Extraction: Clean web content using Trafilatura to extract high-quality text
  • 🧠 Smart Synthesis: Analyze and synthesize information through LLM to generate structured reports
  • 📚 Citation Management: Automatically track information sources and generate standard references
  • 🔄 Recursive Research: Support multi-level question derivation for in-depth exploration of information gaps

✨ Features

Core Capabilities

  1. Multi-Level Search Strategy

    • Automatically decompose research questions into multiple search dimensions
    • Generate targeted search phrases
    • Support customizable search depth and result count
  2. Concurrent Content Processing

    • Multi-threaded concurrent search and content extraction
    • Intelligent caching mechanism to avoid redundant processing
    • Automatic retry mechanism for improved stability
  3. Intelligent Content Filtering

    • LLM-based content relevance evaluation
    • Automatic filtering of irrelevant information
    • Extract key entities and factual background
  4. Derived Question Processing

    • Automatically identify information gaps
    • Generate and research derived questions
    • Support configurable recursion depth
  5. Report Generation

    • Structured reports in Markdown format
    • Streaming output with real-time progress
    • Automatically generate reference lists

Technical Highlights

  • High Performance: Concurrent processing for improved research efficiency
  • Extensible: Support recursive derived research with configurable depth
  • Reliable: Comprehensive error handling and retry mechanisms
  • User-Friendly: Command-line interface with parameter and interactive input support

🚀 Quick Start

Requirements

  • Python 3.11+
  • Conda (recommended) or Python virtual environment

Installation

  1. Clone the repository
git clone https://github.com/Lyt060814/AI2Apps-DeepResearch.git
cd AI2Apps-DeepResearch
  1. Create virtual environment (Conda recommended)
conda create -n deep-research python=3.11
conda activate deep-research
  1. Install dependencies
pip install -r requirements.txt

Or use Tsinghua mirror for faster installation (China users):

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
  1. Configure environment variables

Copy .env.example to .env and fill in your API keys:

cp .env.example .env

Edit the .env file:

OPENAI_API_KEY=your_openai_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional: custom OpenAI API endpoint

Get API Keys

  • OpenAI API: Visit OpenAI Platform to get your API key
  • Tavily API: Visit Tavily to register and get your API key

💡 Usage

Basic Usage

Method 1: Command-line argument

python run.py -q "Your research question"

Example:

python run.py -q "What are the current applications and development trends of AI in healthcare?"

Method 2: Interactive input

python run.py

Then enter your research question when prompted.

Advanced Usage

Configure research parameters

Add configuration to your .env file to customize research behavior:

# LLM Configuration
RESEARCH_LLM_MODEL=gpt-4.1-mini           # LLM model to use
RESEARCH_LLM_TEMPERATURE=0                # Temperature parameter (0-1)

# Search Configuration
RESEARCH_SEARCH_MAX_RESULTS=5             # Maximum results per search
RESEARCH_SEARCH_DEPTH=basic               # Search depth: basic/advanced

# Research Configuration
RESEARCH_MAX_DERIVATION_DEPTH=1           # Maximum depth for derived questions
RESEARCH_REPORT_PATH=/tmp/result.md       # Report output path

Use DeepSearchAgent (Simplified Version)

Edit run.py and replace DeepResearchAgent with DeepSearchAgent:

from src.aalgorithm.agents import DeepSearchAgent

agent = DeepSearchAgent()
report = agent.run(question)

📁 Project Structure

AI2Apps-DeepResearch/
├── run.py                          # Main entry point
├── README.md                       # Project documentation
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment variables template
├── .gitignore                      # Git ignore file
├── setup_agent.js                  # Project installation config (optional)
├── scripts/                        # Scripts directory
│   └── trafilatura_api.py         # Trafilatura API server
└── src/                           # Source code directory
    └── aalgorithm/                # Core algorithm package
        ├── __init__.py            # Package initialization
        ├── utils.py               # Utility functions
        ├── llm.py                 # LLM provider
        ├── search.py              # Search provider
        ├── content.py             # Content cleaning and document management
        └── agents/                # Agents directory
            ├── __init__.py        # Agents package initialization
            ├── deepsearch.py      # Simplified search agent
            └── deepresearch.py    # Full-featured research agent

🔧 Configuration

Environment Variables

Variable Description Default Required
OPENAI_API_KEY OpenAI API key -
TAVILY_API_KEY Tavily search API key -
OPENAI_BASE_URL OpenAI API base URL https://api.openai.com/v1
RESEARCH_LLM_MODEL LLM model name gpt-4.1-mini
RESEARCH_LLM_TEMPERATURE Generation temperature 0
RESEARCH_SEARCH_MAX_RESULTS Max results per search 5
RESEARCH_SEARCH_DEPTH Search depth basic
RESEARCH_MAX_DERIVATION_DEPTH Max derivation depth 1
RESEARCH_REPORT_PATH Report output path /tmp/result.md

Agent Selection

DeepResearchAgent (Full Version)

  • Best for: Deep research and comprehensive analysis scenarios
  • Features: Supports derived questions, citation management, multi-level analysis
  • Time: Longer, but more detailed results

DeepSearchAgent (Simplified Version)

  • Best for: Quick information overview scenarios
  • Features: Simplified workflow, faster generation
  • Time: Shorter, suitable for quick queries

🔍 How It Works

DeepResearchAgent Workflow

1. Question Analysis
   ↓
2. Generate Search Strategy (Multi-angle Decomposition)
   ↓
3. Concurrent Search & Data Collection
   ↓
4. Content Relevance Filtering (LLM Evaluation)
   ↓
5. Content Cleaning & Extraction (Trafilatura)
   ↓
6. Information Synthesis & Analysis (LLM)
   ↓
7. Generate Base Answers
   ↓
8. Identify & Process Derived Questions (Recursive)
   ↓
9. Generate Final Report (Markdown)
   ↓
10. Add References

Core Technologies

  • LLM Integration: Uses OpenAI API for question analysis, content evaluation, and report generation
  • Web Search: Integrates Tavily API for high-quality web searches
  • Content Extraction: Uses Trafilatura library to extract clean web text
  • Concurrent Processing: Uses Python's concurrent.futures for multi-threaded concurrency
  • Citation Tracking: Automatically manages information sources and generates standard citation format

📊 Examples

Example 1: Technology Trend Research

python run.py -q "What are the latest advances and commercial applications of quantum computing in 2024?"

Example 2: Event Background Analysis

python run.py -q "What are the features of a company's newly released AI product? How are competitors responding?"

Example 3: Academic Research

python run.py -q "What are the current solutions to the hallucination problem in large language models?"

📄 License

This project is licensed under the MIT License - see the LICENSE file for details

🙏 Acknowledgments

This project uses the following excellent open-source projects and services:

About

An automated research tool based on Large Language Models (LLM) that can automatically collect, analyze, and synthesize information to generate structured research reports.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages