██╗███╗ ██╗████████╗███████╗██╗ ██╗██████╗ ██████╗ ██████╗ ██████╗ ██████╗
██║████╗ ██║╚══██╔══╝██╔════╝██║ ██║██╔══██╗██╔══██╗ ██╔══██╗██╔══██╗██╔═══██╗
██║██╔██╗ ██║ ██║ █████╗ ██║ ██║██║ ██║██║ ██║ ██████╔╝██████╔╝██║ ██║
██║██║╚██╗██║ ██║ ██╔══╝ ██║ ██║██║ ██║██║ ██║ ██╔═══╝ ██╔══██╗██║ ██║
██║██║ ╚████║ ██║ ███████╗███████╗██║██████╔╝██████╔╝ ██║ ██║ ██║╚██████╔╝
╚═╝╚═╝ ╚═══╝ ╚═╝ ╚══════╝╚══════╝╚═╝╚═════╝ ╚═════╝ ╚═╝ ╚═╝ ╚═╝ ╚═════╝
Transform a single company URL into an investment-grade due diligence report in under 4 minutes. 7 specialized AI agents research in parallel, score risk quantitatively, benchmark against competitors, and synthesize everything into a structured report — all on free-tier APIs.
Quick Start · Features · Architecture · Tech Stack · Roadmap
IntelliDD Pro is a production-grade, multi-agent due diligence system built on LangGraph. Feed it any company URL and it autonomously dispatches 7 specialized AI agents that research the company across every critical investment dimension simultaneously — then synthesizes the findings into a structured, sector-aware report complete with quantitative risk scores and competitor analysis.
This project extends and significantly surpasses the baseline awesome-ai-apps/due_diligence_agent by:
- Replacing all paid APIs with free-tier alternatives (no credit card required for basic use)
- Adding a 7th Competitor Intelligence Agent with automatic competitor discovery and positioning matrix
- Introducing quantitative Risk Scoring across 5 dimensions with Plotly radar + gauge visualizations
- Building sector-aware prompts for 7 industry verticals (AI/Tech, Fintech, Healthcare, SaaS, E-commerce, Consumer, DeepTech)
- Wiring in LangSmith observability so every agent trace is inspectable in a dashboard
- Adding persistent SQLite storage with run history and report comparison (Phase 4)
- Integrating document RAG via ChromaDB so uploaded pitch decks augment web research (Phase 3)
- Implementing a ReAct Q&A agent for interactive post-report questioning (Phase 5)
Each agent runs concurrently via ThreadPoolExecutor, researching a different dimension of the company:
| Agent | Researches | Output |
|---|---|---|
| 🌱 Seed Crawler | Company homepage, tagline, business model | company_profile.json |
| 👥 Team Agent | Founders, executives, advisors, hiring signals | founders_team.json |
| 💰 Investor Agent | Funding rounds, investor quality, runway | investors.json |
| 📰 Press Agent | Media coverage, sentiment, key narratives | press.json |
| 📊 Financials Agent | Revenue signals, unit economics, burn rate | financials.json |
| ⚙️ Tech Stack Agent | Architecture, GitHub presence, security posture | tech_stack.json |
| 📱 Social Agent | LinkedIn, Twitter/X, community, brand strength | social.json |
| 🏆 Competitor Intel Agent | 3–5 competitors, comparison matrix, positioning | competitor_intel.json |
After all agents complete, a dedicated Risk Scorer produces a structured scorecard across 5 investment dimensions:
✅ Founder Risk : 3/10 [Low]
⚠️ Market Risk : 6/10 [Medium]
⚠️ Financial Risk : 4/10 [Medium]
✅ Technical Risk : 2/10 [Low]
✅ Reputational Risk : 3/10 [Low]
──────────────────────────────────
🎯 Overall Risk Score : 4/10
💡 DD Confidence Score : 72/100
Visualized as an interactive Plotly radar chart + gauge chart in the Streamlit UI.
Every agent's prompts are dynamically tailored based on the detected industry sector:
| Sector | What Changes |
|---|---|
| AI / Dev Tools | Looks for GPU cost margins, inference stack, OSS conversion rate, NeurIPS publications |
| Fintech | Looks for TPV, take rate, default rate, PCI DSS, regulatory capital |
| Healthcare | Looks for FDA clearance, HIPAA compliance, EHR integration, clinical trial status |
| B2B SaaS | Looks for NRR, ACV, Rule of 40, SOC 2, sales cycle length |
| E-commerce | Looks for GMV, contribution margin per order, take rate, CAC payback |
| Consumer | Looks for DAU/MAU ratio, D30 retention, ARPU, LTV:CAC |
| Deep Tech | Looks for TRL level, patent portfolio, CapEx curve, government contracts |
- Automatically discovers 3–5 direct competitors from a single URL
- Profiles each competitor: funding, stage, employee count, key differentiators
- Produces a side-by-side comparison matrix scored across 6 dimensions
- Outputs market position:
Leader / Strong Challenger / Challenger / Niche Player / New Entrant - Calculates a Differentiation Score (0–100) for the target company
Every LLM call and tool invocation is traced via LangSmith:
- Per-agent latency, token count, and input/output
- Full reasoning chain visible in dashboard
- "View Trace" button in UI linking to exact run
The entire pipeline runs with zero paid API costs for standard use:
| Component | Tool | Free Limit |
|---|---|---|
| Primary LLM | Groq (llama-3.3-70b) |
14,400 req/day |
| Fallback LLM | Google Gemini 1.5 Flash | 1,500 req/day |
| Web Scraper | Firecrawl | 500 scrapes/month |
| Web Search | DuckDuckGo SDK | No limit (no key needed) |
| Observability | LangSmith | 5,000 traces/month |
| Vector DB | ChromaDB (local) | Unlimited |
| Persistence | SQLite (local) | Unlimited |
┌─────────────────────────────────────────────────────────────┐
│ Streamlit Frontend │
│ [URL Input] [Upload Docs] [History] [Chat] [Alerts] │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ LangGraph Orchestration Layer │
│ │
│ ┌───────────┐ ┌──────────────────────────────────────┐ │
│ │ Seed │ │ Parallel Specialist Agents │ │
│ │ Crawler │──▶│ Team │ Investors │ Press │ Finance │ │
│ │ Agent │ │ TechStack │ Social │ CompetitorIntel│ │
│ └───────────┘ └──────────────────┬───────────────────┘ │
│ │ │
│ ┌──────────┐ ┌──────────┐ │ │
│ │ Risk │ │Validator │◀──────┘ │
│ │ Scorer │◀──│ Agent │ │
│ └──────────┘ └──────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Synthesis Agent ──▶ Markdown + PDF Report │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ReAct Q&A Agent (LangGraph subgraph + tools) │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────┬───────────────────────────────────────┘
│
┌──────────────┴──────────────┐
┌───────▼──────────┐ ┌──────────▼──────────────────────┐
│ Free Tool Layer │ │ Persistence Layer │
│ │ │ │
│ Groq (LLM) │ │ SQLite (run history) │
│ Gemini (fallback)│ │ ChromaDB (document RAG) │
│ DuckDuckGo SDK │ │ APScheduler (monitoring) │
│ Firecrawl / BS4 │ │ LangSmith (traces) │
│ Plotly (charts) │ │ WeasyPrint (PDF export) │
└───────────────────┘ └─────────────────────────────────┘
START
│
▼
[seed_node] ← scrape homepage, extract company profile, detect sector
│
▼
[specialists_node] ← 7 agents run in PARALLEL via ThreadPoolExecutor
│ ├── TeamAgent
│ ├── InvestorAgent
│ ├── PressAgent
│ ├── FinancialsAgent
│ ├── TechStackAgent
│ ├── SocialAgent
│ └── CompetitorAgent ← NEW vs baseline
│
▼
[validator_node] ← cross-checks all outputs, flags contradictions + gaps
│
▼
[risk_node] ← scores 5 risk dimensions, produces radar + gauge charts
│
▼
[synthesis_node] ← writes full markdown report + PDF
│
▼
END
intellidd_pro/
├── pyproject.toml # uv project manifest + all dependencies
├── config.py # central config — all env vars live here
├── .env.example # template — copy to .env and fill keys
│
├── agents/ # one file per specialist agent
│ ├── seed_crawler.py # website crawl + profile extraction
│ ├── team_agent.py # founders + leadership research
│ ├── investor_agent.py # funding + investor research
│ ├── press_agent.py # media coverage + sentiment
│ ├── financials_agent.py # financial signals extraction
│ ├── tech_stack_agent.py # technology + engineering research
│ ├── social_agent.py # social media + brand research
│ ├── competitor_agent.py # competitor discovery + matrix
│ ├── validator_agent.py # cross-validation + gap detection
│ ├── risk_scorer.py # quantitative risk scoring
│ └── synthesis_agent.py # final report generation
│
├── pipeline/
│ ├── state.py # DDState TypedDict — shared data contract
│ ├── graph.py # LangGraph StateGraph wiring
│ └── runner.py # public entry point: run_due_diligence(url)
│
├── prompts/
│ ├── agent_prompts.py # all LLM prompts — one place to tune
│ ├── sectors.py # sector detection enum + detect_sector()
│ └── sector_prompts.py # sector-specific research signals per agent
│
├── tools/
│ ├── llm_factory.py # Groq + Gemini factory with auto-fallback + retry
│ ├── scraper.py # Firecrawl → Playwright → BS4 degradation stack
│ └── search.py # DuckDuckGo SDK wrapper (no key needed)
│
├── ui/
│ ├── components/
│ │ ├── risk_chart.py # Plotly radar + gauge chart components
│ │ ├── report_card.py # structured report display
│ │ └── comparison_diff.py # run comparison view
│ └── pages/
│ ├── 1_Research.py # main DD pipeline page
│ ├── 2_History.py # past reports + comparison
│ ├── 3_Monitoring.py # company monitor management
│ └── 4_QA_Chat.py # ReAct Q&A chat interface
│
├── rag/
│ ├── document_processor.py # PDF/CSV/XLSX → chunks
│ └── vector_store.py # ChromaDB operations
│
├── persistence/
│ ├── db.py # SQLite setup + migrations
│ └── queries.py # all query functions
│
├── monitoring/
│ ├── scheduler.py # APScheduler setup
│ ├── change_detector.py # report diffing logic
│ └── alerting.py # email + Slack alerts
│
└── outputs/ # auto-created per run
└── {company}_{timestamp}_{run_id}/
├── company_profile.json
├── founders_team.json
├── investors.json
├── press.json
├── financials.json
├── tech_stack.json
├── social.json
├── competitor_intel.json
├── validation_notes.json
├── risk_scorecard.json
├── report.md
└── report.pdf
- Python 3.11+
- uv package manager
- A free Groq API key ← the only required key
git clone https://github.com/yourusername/intellidd-pro.git
cd intellidd-pro
# Create virtual environment and install all dependencies
uv venv
uv sync
# Install Playwright browser (one-time setup)
uv run playwright install chromiumcp .env.example .envOpen .env and add your keys:
# REQUIRED — free at https://console.groq.com (no credit card)
GROQ_API_KEY=your_groq_key_here
# RECOMMENDED — free at https://aistudio.google.com (LLM fallback)
GOOGLE_API_KEY=your_google_key_here
# OPTIONAL — free at https://firecrawl.dev (better scraping)
FIRECRAWL_API_KEY=your_firecrawl_key_here
# OPTIONAL — free at https://smith.langchain.com (agent observability)
LANGCHAIN_API_KEY=your_langsmith_key_here
LANGCHAIN_TRACING_V2=trueuv run python config.pyYou should see:
✅ All configuration looks good!
Active LLM: groq (llama-3.3-70b-versatile)
Firecrawl: enabled
LangSmith: enabled
uv run python pipeline/runner.py https://stripe.comThat's it. The pipeline will:
- Scrape the company website
- Run 7 agents in parallel
- Score risk across 5 dimensions
- Write the full report to
outputs/stripe_YYYYMMDD_HHMMSS/report.md
uv run streamlit run main.pyOpen http://localhost:8501 in your browser.
After running on any company, the outputs/ folder contains:
// risk_scorecard.json — example output
{
"founder_risk": {
"score": 3,
"severity": "Low",
"key_factors": ["Strong technical founders", "Prior exits", "Deep domain expertise"],
"evidence": "Founders previously at Google Brain and DeepMind...",
"mitigation": "No action needed — team strength is a positive signal"
},
"market_risk": {
"score": 6,
"severity": "Medium",
"key_factors": ["Highly competitive space", "Large incumbents", "Commoditization risk"],
"evidence": "Market has 8+ funded competitors including OpenAI, Cohere...",
"mitigation": "Focus on open-source moat and community lock-in"
},
"overall_risk_score": 4,
"dd_confidence_score": 72,
"risk_summary": "..."
}// competitor_intel.json — example output
{
"competitors": [
{
"name": "OpenAI",
"funding_total": "$11.3B",
"overlap_score": 8,
"strengths_vs_target": ["Larger model capability", "ChatGPT brand recognition"],
"weaknesses_vs_target": ["Closed source", "Higher pricing", "No community hub"]
}
],
"market_position": "Strong Challenger",
"differentiation_score": 78,
"moat_assessment": "Open-source community flywheel with 1M+ models hosted creates switching cost..."
}| Feature | Baseline (due_diligence_agent) |
IntelliDD Pro |
|---|---|---|
| Agent Framework | AG2 / AutoGen | LangGraph |
| LLM Provider | Nebius (paid) | Groq free + Gemini fallback |
| Web Scraper | TinyFish (paid) | Firecrawl free / Playwright / BS4 |
| Web Search | Hardcoded URLs | DuckDuckGo SDK (no key) |
| Specialist Agents | 6 agents | 7 agents (+Competitor Intel) |
| Sector Awareness | None (generic prompts) | 7 sector-specific prompt systems |
| Risk Analysis | None | 5-dimension quantitative scoring |
| Visualizations | None | Plotly radar + gauge charts |
| Document RAG | None | ChromaDB + sentence-transformers |
| Persistence | JSON files only | SQLite + full run history |
| Observability | None | LangSmith tracing |
| Q&A Agent | Prompt only (not wired) | Full ReAct LangGraph subgraph |
| Monitoring | None | APScheduler + email/Slack alerts |
| PDF Export | None | WeasyPrint styled PDF |
| Zero-cost runnable | (requires paid keys) | 100% free tier |
| Layer | Tool | Free Tier | Sign Up |
|---|---|---|---|
| Primary LLM | Groq llama-3.3-70b |
14,400 req/day | console.groq.com |
| Fallback LLM | Google Gemini 1.5 Flash | 1,500 req/day | aistudio.google.com |
| Web Scraper | Firecrawl | 500 scrapes/month | firecrawl.dev |
| Web Search | DuckDuckGo SDK | Unlimited | pip install duckduckgo-search |
| Observability | LangSmith | 5,000 traces/month | smith.langchain.com |
| Vector DB | ChromaDB (local) | Unlimited | Built-in |
| Persistence | SQLite (local) | Unlimited | Built-in Python |
| Scraper fallback | Playwright + BS4 | Unlimited | Built-in |
Total monthly cost for standard use: $0
All configuration is managed through environment variables in .env:
| Variable | Required | Default | Description |
|---|---|---|---|
GROQ_API_KEY |
Yes | — | Primary LLM. Free at console.groq.com |
GOOGLE_API_KEY |
Recommended | — | Fallback LLM when Groq rate-limits |
FIRECRAWL_API_KEY |
Optional | — | Better scraping. Falls back to Playwright |
LANGCHAIN_API_KEY |
Optional | — | LangSmith tracing |
LANGCHAIN_TRACING_V2 |
Optional | false |
Set true to enable traces |
LANGCHAIN_PROJECT |
Optional | intellidd-pro |
LangSmith project name |
MAX_WORKERS |
Optional | 7 |
Parallel agent threads |
AGENT_TIMEOUT_SECONDS |
Optional | 90 |
Per-agent timeout |
DB_PATH |
Optional | ./intellidd.db |
SQLite database path |
CHROMA_PATH |
Optional | ./chroma_store |
ChromaDB persistence dir |
ALERT_EMAIL_FROM |
Optional | — | Gmail for monitoring alerts |
ALERT_SLACK_WEBHOOK |
Optional | — | Slack incoming webhook URL |
IntelliDD Pro has a built-in 3-layer rate limit protection in tools/llm_factory.py:
- Stagger delay — each parallel agent waits
random(0.5, 2.5)sbefore calling the LLM, preventing TPM burst - Smart retry — parses Groq's
"Please try again in 1.93s"error message and waits exactly that long (+ buffer) - Exponential backoff — if no wait time is provided, waits 5s → 10s → 20s → 40s across up to 5 attempts
- Automatic fallback — if Groq is exhausted for the day, switches to Gemini via LangChain's
.with_fallbacks()
Contributions are welcome! This is an active portfolio project.
# Fork and clone
git clone https://github.com/yourusername/intellidd-pro.git
# Create a feature branch
git checkout -b feature/your-feature-name
# Install dev dependencies
uv sync --dev
# Run linting
uv run black .
uv run isort .
# Run tests
uv run pytest tests/Areas where contributions are especially welcome:
- Additional sector prompts (Marketplace, GovTech, EdTech as standalone, etc.)
- New specialist agents (Legal/Regulatory, ESG/Sustainability)
- Better competitor discovery logic
- UI improvements to the Streamlit frontend
- Test coverage for agents and pipeline
- Built with LangGraph by LangChain
- LLM inference powered by Groq (free tier)
- Agent observability via LangSmith
Built as a portfolio project demonstrating production-grade agentic AI engineering.
Multi-agent orchestration · Agentic RAG · LangGraph · Free-tier LLMs · Quantitative risk scoring
⭐ Star this repo if you found it useful ⭐