Skip to content

robwatsongtr/multi-agent-research-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multi-Agent Research Assistant

A collaborative AI research system where specialized agents work together to research topics, synthesize findings, and validate claims. Built with Pydantic models for type safety and robust LLM parsing to handle non-deterministic outputs.

Features

  • πŸ€– Multi-agent collaboration - Four specialized AI agents (Coordinator, Researcher, Synthesizer, Critic)
  • πŸ” Web search integration - Automated research using Tavily API
  • πŸ“Š Type-safe data flow - Pydantic models for validation and IDE autocomplete
  • πŸ›‘οΈ Robust LLM parsing - Two-stage parsing handles markdown, code blocks, and format variations
  • βœ… Quality control - Built-in critic agent validates research quality
  • πŸ§ͺ Comprehensive testing - Unit tests, integration tests, and LLM evals

Architecture

The system uses four specialized agents that collaborate in sequence:

User Query
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  COORDINATOR    β”‚  Breaks query into 2-4 research subtasks
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  RESEARCHER     β”‚  Executes each subtask using web search
β”‚  (per subtask)  β”‚  Returns findings with sources
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SYNTHESIZER    β”‚  Combines findings into coherent report
β”‚                 β”‚  Organizes by themes, preserves citations
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   CRITIC        β”‚  Reviews quality, identifies gaps
β”‚                 β”‚  Suggests improvements
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent Responsibilities

  1. Coordinator Agent: Analyzes user queries and breaks them into 2-4 focused research subtasks
  2. Researcher Agent: Executes web searches for each subtask and extracts structured findings with sources
  3. Synthesizer Agent: Combines all findings into an organized report with sections and key insights
  4. Critic Agent: Reviews the synthesized report for quality, unsupported claims, and research gaps

Data Flow with Pydantic

All inter-agent communication uses validated Pydantic models:

# Coordinator returns
CoordinatorResponse(subtasks=["task1", "task2", "task3"])

# Researcher returns (per subtask)
ResearchResult(
    subtask="...",
    findings=[Finding(claim="...", source="https://...", details="...")]
)

# Synthesizer returns
SynthesizedReport(
    summary="...",
    sections=[SynthesisSection(title="...", content="...", sources=[...])],
    key_insights=["...", "..."]
)

# Critic returns
CriticReview(
    overall_quality="...",
    issues=[CriticIssue(type="...", description="...", severity="...")],
    suggestions=["..."],
    needs_more_research=False
)

Installation

Prerequisites

  • Python 3.12+
  • Anthropic API key
  • Tavily API key for web search

Setup

# Clone and setup
git clone <your-repo-url>
cd multi-agent-research-app
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cat > .env << EOF
ANTHROPIC_API_KEY=your_anthropic_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
EOF

# Install pre-commit hooks (optional)
pre-commit install

Quick Start

# Basic usage
python main.py "What are the latest developments in quantum computing?"

# Verbose mode (detailed logging)
python main.py "What are the latest developments in quantum computing?" --verbose

Agent Implementation

Agents inherit from BaseAgent and implement domain logic:

class ResearcherAgent(BaseAgent):
    def research(self, subtask: str, tools, tool_executor) -> ResearchResult:
        # Call Claude API
        response = self.call_claude(subtask, tools=tools, tool_executor=tool_executor)

        # Two-stage parsing
        json_text = extract_json_from_text(self.parse_response(response))
        result_dict = json.loads(json_text)

        # Pydantic validation
        return ResearchResult(**result_dict)

System prompts in config/prompts.yaml define behavior without code changes.

Project Structure

multi-agent-research-app/
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ base.py            # BaseAgent with API calls & tool handling
β”‚   β”œβ”€β”€ models.py          # Pydantic models for all data structures
β”‚   β”œβ”€β”€ parsing.py         # Two-stage LLM parsing (extract + validate)
β”‚   β”œβ”€β”€ coordinator.py     # Query β†’ subtasks
β”‚   β”œβ”€β”€ researcher.py      # Subtask β†’ findings (uses web_search tool)
β”‚   β”œβ”€β”€ synthesizer.py     # Findings β†’ report
β”‚   └── critic.py          # Report β†’ quality review
β”œβ”€β”€ orchestration/
β”‚   └── workflow.py        # Coordinates agent execution
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ prompts.yaml       # System prompts for each agent
β”‚   └── settings.py        # API keys, environment config
β”œβ”€β”€ tools/
β”‚   └── web_search.py      # Tavily web search integration
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_agents.py     # Unit tests (mocked API)
β”‚   β”œβ”€β”€ test_researcher.py # Researcher-specific tests
β”‚   β”œβ”€β”€ test_workflow.py   # Integration tests (mocked API)
β”‚   └── evals/             # LLM evals (real API calls)
β”‚       └── test_workflow_evals.py
└── main.py                # CLI entry point

Extending the System

Adding a New Agent

  1. Define Pydantic model in agents/models.py:
class FactCheckResult(BaseModel):
    verified: bool
    confidence: float = Field(ge=0.0, le=1.0)
    reasoning: str
  1. Create agent in agents/fact_checker.py:
from agents.base import BaseAgent
from agents.models import FactCheckResult
from agents.parsing import extract_json_from_text
import json

class FactCheckerAgent(BaseAgent):
    def verify(self, claim: str, sources: list[str]) -> FactCheckResult:
        response = self.call_claude(f"Claim: {claim}\nSources: {sources}")

        # Two-stage parsing
        json_text = extract_json_from_text(self.parse_response(response))
        result_dict = json.loads(json_text)

        return FactCheckResult(**result_dict)
  1. Add prompt to config/prompts.yaml:
fact_checker: |
  Verify claims against sources. Return JSON:
  {"verified": true/false, "confidence": 0-1, "reasoning": "..."}
  1. Integrate into orchestration/workflow.py

Testing

# Fast tests (unit + integration with mocks)
pytest tests/ -v

# All tests including LLM evals (costs tokens)
pytest tests/ -v -m ""

# Type checking
mypy agents/ orchestration/

# Pre-commit hooks (runs on every commit)
pre-commit run --all-files

Test Philosophy:

  • Unit tests - Mock API, test parsing logic and Pydantic validation
  • Integration tests - Mock API, test agent coordination
  • LLM evals - Real API calls, test output quality properties (not exact matches)

Pre-commit: Runs mypy, pytest, and code formatters automatically on commit (see PRE_COMMIT_SETUP.md)

Configuration

Environment: .env file with API keys

ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...

Agent Behavior: Edit config/prompts.yaml

coordinator: |
  Break queries into 3-5 subtasks (changed from 2-4)
  Prioritize recent information from last 6 months

Workflow: Modify orchestration/workflow.py for iteration, multi-round research, etc.

MIT License - see LICENSE file for details

Contributing

  1. Follow coding standards in CODING_STANDARDS.md
  2. Add tests for new features
  3. Update documentation
  4. Ensure pre-commit hooks pass

Acknowledgments

About

A collaborative AI research system where specialized agents work together to research topics, synthesize findings, and validate claims.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages