A collaborative AI research system where specialized agents work together to research topics, synthesize findings, and validate claims. Built with Pydantic models for type safety and robust LLM parsing to handle non-deterministic outputs.
- π€ Multi-agent collaboration - Four specialized AI agents (Coordinator, Researcher, Synthesizer, Critic)
- π Web search integration - Automated research using Tavily API
- π Type-safe data flow - Pydantic models for validation and IDE autocomplete
- π‘οΈ Robust LLM parsing - Two-stage parsing handles markdown, code blocks, and format variations
- β Quality control - Built-in critic agent validates research quality
- π§ͺ Comprehensive testing - Unit tests, integration tests, and LLM evals
The system uses four specialized agents that collaborate in sequence:
User Query
β
βββββββββββββββββββ
β COORDINATOR β Breaks query into 2-4 research subtasks
ββββββββββ¬βββββββββ
β
βββββββββββββββββββ
β RESEARCHER β Executes each subtask using web search
β (per subtask) β Returns findings with sources
ββββββββββ¬βββββββββ
β
βββββββββββββββββββ
β SYNTHESIZER β Combines findings into coherent report
β β Organizes by themes, preserves citations
ββββββββββ¬βββββββββ
β
βββββββββββββββββββ
β CRITIC β Reviews quality, identifies gaps
β β Suggests improvements
βββββββββββββββββββ
- Coordinator Agent: Analyzes user queries and breaks them into 2-4 focused research subtasks
- Researcher Agent: Executes web searches for each subtask and extracts structured findings with sources
- Synthesizer Agent: Combines all findings into an organized report with sections and key insights
- Critic Agent: Reviews the synthesized report for quality, unsupported claims, and research gaps
All inter-agent communication uses validated Pydantic models:
# Coordinator returns
CoordinatorResponse(subtasks=["task1", "task2", "task3"])
# Researcher returns (per subtask)
ResearchResult(
subtask="...",
findings=[Finding(claim="...", source="https://...", details="...")]
)
# Synthesizer returns
SynthesizedReport(
summary="...",
sections=[SynthesisSection(title="...", content="...", sources=[...])],
key_insights=["...", "..."]
)
# Critic returns
CriticReview(
overall_quality="...",
issues=[CriticIssue(type="...", description="...", severity="...")],
suggestions=["..."],
needs_more_research=False
)- Python 3.12+
- Anthropic API key
- Tavily API key for web search
# Clone and setup
git clone <your-repo-url>
cd multi-agent-research-app
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cat > .env << EOF
ANTHROPIC_API_KEY=your_anthropic_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
EOF
# Install pre-commit hooks (optional)
pre-commit install# Basic usage
python main.py "What are the latest developments in quantum computing?"
# Verbose mode (detailed logging)
python main.py "What are the latest developments in quantum computing?" --verboseAgents inherit from BaseAgent and implement domain logic:
class ResearcherAgent(BaseAgent):
def research(self, subtask: str, tools, tool_executor) -> ResearchResult:
# Call Claude API
response = self.call_claude(subtask, tools=tools, tool_executor=tool_executor)
# Two-stage parsing
json_text = extract_json_from_text(self.parse_response(response))
result_dict = json.loads(json_text)
# Pydantic validation
return ResearchResult(**result_dict)System prompts in config/prompts.yaml define behavior without code changes.
multi-agent-research-app/
βββ agents/
β βββ base.py # BaseAgent with API calls & tool handling
β βββ models.py # Pydantic models for all data structures
β βββ parsing.py # Two-stage LLM parsing (extract + validate)
β βββ coordinator.py # Query β subtasks
β βββ researcher.py # Subtask β findings (uses web_search tool)
β βββ synthesizer.py # Findings β report
β βββ critic.py # Report β quality review
βββ orchestration/
β βββ workflow.py # Coordinates agent execution
βββ config/
β βββ prompts.yaml # System prompts for each agent
β βββ settings.py # API keys, environment config
βββ tools/
β βββ web_search.py # Tavily web search integration
βββ tests/
β βββ test_agents.py # Unit tests (mocked API)
β βββ test_researcher.py # Researcher-specific tests
β βββ test_workflow.py # Integration tests (mocked API)
β βββ evals/ # LLM evals (real API calls)
β βββ test_workflow_evals.py
βββ main.py # CLI entry point
- Define Pydantic model in
agents/models.py:
class FactCheckResult(BaseModel):
verified: bool
confidence: float = Field(ge=0.0, le=1.0)
reasoning: str- Create agent in
agents/fact_checker.py:
from agents.base import BaseAgent
from agents.models import FactCheckResult
from agents.parsing import extract_json_from_text
import json
class FactCheckerAgent(BaseAgent):
def verify(self, claim: str, sources: list[str]) -> FactCheckResult:
response = self.call_claude(f"Claim: {claim}\nSources: {sources}")
# Two-stage parsing
json_text = extract_json_from_text(self.parse_response(response))
result_dict = json.loads(json_text)
return FactCheckResult(**result_dict)- Add prompt to
config/prompts.yaml:
fact_checker: |
Verify claims against sources. Return JSON:
{"verified": true/false, "confidence": 0-1, "reasoning": "..."}- Integrate into
orchestration/workflow.py
# Fast tests (unit + integration with mocks)
pytest tests/ -v
# All tests including LLM evals (costs tokens)
pytest tests/ -v -m ""
# Type checking
mypy agents/ orchestration/
# Pre-commit hooks (runs on every commit)
pre-commit run --all-filesTest Philosophy:
- Unit tests - Mock API, test parsing logic and Pydantic validation
- Integration tests - Mock API, test agent coordination
- LLM evals - Real API calls, test output quality properties (not exact matches)
Pre-commit: Runs mypy, pytest, and code formatters automatically on commit (see PRE_COMMIT_SETUP.md)
Environment: .env file with API keys
ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...Agent Behavior: Edit config/prompts.yaml
coordinator: |
Break queries into 3-5 subtasks (changed from 2-4)
Prioritize recent information from last 6 monthsWorkflow: Modify orchestration/workflow.py for iteration, multi-round research, etc.
MIT License - see LICENSE file for details
- Follow coding standards in
CODING_STANDARDS.md - Add tests for new features
- Update documentation
- Ensure pre-commit hooks pass
- Built with Anthropic Claude API
- Web search powered by Tavily