Benriched - Company Enrichment API

An AI-powered company and contact enrichment API that combines web search, intelligent scraping, and AI analysis to extract structured business intelligence data.

Overview

Benriched is a sophisticated enrichment system that automatically researches companies and extracts verified data including:

Revenue and employee count
Headquarters location and subsidiary status
Business description and industry classification (NAICS codes)
LinkedIn profile matching
ICP (Ideal Customer Profile) matching
Contact enrichment via ZoomInfo

The system uses a 12-stage pipeline with intelligent caching, cost optimization, and automatic outlier detection to balance accuracy, speed, and cost.

Key Features

✅ Multi-stage enrichment - 12-stage pipeline with Pass 1 (web search) and Pass 2 (content analysis) ✅ Intelligent scraping - Smart URL selection reduces Firecrawl costs by 50-70% ✅ Automatic deep research - Detects outliers and conflicting data, runs targeted queries ✅ Parent company inheritance - Brands inherit revenue/size data from parent companies ✅ Cost tracking - Transparent cost breakdown for every API call ✅ Database caching - Avoids re-enriching same domain ✅ Backwards compatible - Old endpoints remain active forever ✅ Comprehensive quality metrics - Confidence levels for every data point

Quick Start

Prerequisites

Node.js 18+
npm 9+
Supabase account (database)
API keys for external services

Installation

# Clone repository
git clone https://github.com/yourusername/benriched.git
cd benriched

# Install dependencies
npm install

# Set up environment variables
cp .env.example .env.local
# Edit .env.local with your API keys

Run Locally

npm run dev

API runs on http://localhost:8787

Test Endpoint

curl -X POST http://localhost:8787/v1/enrich/company \
  -H "X-API-Key: amlink21" \
  -H "Content-Type: application/json" \
  -d '{"domain": "lincolnpremiumpoultry.com"}'

API Endpoints

Current (v1) Endpoints

Endpoint	Method	Description
`POST /v1/enrich/company`	POST	Enrich company by domain
`POST /v1/enrich/contact`	POST	Enrich contact via ZoomInfo
`POST /v1/match/persona`	POST	Match job title to persona
`POST /v1/research/contact`	POST	Research prospect for outbound
`POST /v1/generate/email-sequence`	POST	Generate personalized email sequences
`GET /v1/health`	GET	Health check (no auth required)

Legacy Endpoints (Supported Indefinitely)

POST /enrich, POST /enrich/contact, POST /persona, POST /research/contact, POST /outreach/email-sequence

Note: Legacy endpoints are fully supported and will not be deprecated. They are aliases to v1 endpoints.

Documentation

Complete documentation is organized for different audiences:

API Reference - Complete endpoint documentation, request/response formats, authentication methods, error handling
System Architecture - 12-stage enrichment pipeline, cost tracking, quality assurance mechanisms, integration details
Development Guide - Local setup, testing, deployment, contributing guidelines
Database Schema - Table definitions, indexes, common queries, data types

For system documentation used by Claude AI, see claude.md.

Tech Stack

API Framework: Hono (lightweight, edge-compatible)
Search: Perplexity Sonar Pro (web search with real-time access)
Analysis: OpenAI GPT-4o-mini (content extraction)
Scraping: Firecrawl (JavaScript-rendered content)
Contact Enrichment: ZoomInfo API
Database: Supabase (PostgreSQL)
Hosting: Vercel (serverless)

Architecture

Request Flow

Request → Cache Check → Domain Resolution → Pass 1 (Web Search)
  ↓
Deep Research (if needed) → URL Selection → Firecrawl Scraping
  ↓
Entity Mismatch Detection → LinkedIn Validation → Pass 2 (Content Analysis)
  ↓
Revenue Estimation → Parent Company Lookup → ICP Matching
  ↓
Database Storage → Cost Calculation → Response

Dual-AI Architecture

Pass 1 (Perplexity): Web search to identify company and find initial data
- Company name and parent company relationship
- Revenue figures from multiple sources
- Employee count and headquarters location
- LinkedIn profile candidates
- URLs to scrape
Pass 2 (GPT-4o-mini): Content analysis to extract structured data
- Business description (primary business activity)
- Revenue band (from 12 predefined bands)
- Employee band (from 9 predefined bands)
- NAICS industry codes
- Quality metrics for each field

Cost Optimization

Intelligent caching: Cache check before any API calls
Smart scraping: Conditional URL selection based on what data Pass 1 already found
Deep research triggers: Automatic outlier detection (only runs when needed)
Typical cost: $0.03-0.08 per company (cached hit: $0.00)

Environment Variables

Required for local development and production:

# Supabase
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your_anon_key_here

# AI Gateway (Vercel)
AI_GATEWAY_API_KEY=your_api_gateway_key

# Firecrawl (web scraping)
FIRECRAWL_API_KEY=your_firecrawl_key

# ZoomInfo (contact enrichment)
ZI_USERNAME=your_username
ZI_PASSWORD=your_password
ZI_AUTH_URL=https://api.zoominfo.com/api/v2/auth/token
ZI_ENRICH_URL=https://api.zoominfo.com/api/v2/contact/enrich

# Authentication
API_KEY=amlink21

# Port (optional)
PORT=8787

Deployment

Deploy to Vercel

# Manual deployment
npm run build
vercel --prod

# Or push to main branch (auto-deploys)
git push origin main

Production Testing

curl -X POST https://benriched.vercel.app/v1/enrich/company \
  -H "X-API-Key: amlink21" \
  -H "Content-Type: application/json" \
  -d '{"domain": "lincolnpremiumpoultry.com"}'

Response Example

{
  "success": true,
  "data": {
    "company_name": "Lincoln Premium Poultry",
    "domain": "lincolnpremiumpoultry.com",
    "website": "https://lincolnpremiumpoultry.com",
    "linkedin_url": "https://www.linkedin.com/company/lincoln-premium-poultry",
    "business_description": "Premium poultry producer based in Nebraska...",
    "company_size": "1,001-5,000 Employees",
    "company_revenue": "500M-1B",
    "city": "Fremont",
    "state": "Nebraska",
    "hq_country": "US",
    "is_us_hq": true,
    "is_us_subsidiary": false,
    "naics_codes_6_digit": [
      {"code": "311615", "description": "Poultry Processing"}
    ],
    "target_icp": true,
    "quality": {
      "revenue": {"confidence": "high", "reasoning": "Confirmed by multiple sources"},
      "size": {"confidence": "high", "reasoning": "Employee count from LinkedIn"},
      "location": {"confidence": "high", "reasoning": "Confirmed by multiple sources"},
      "industry": {"confidence": "high", "reasoning": "NAICS codes based on business activities"}
    }
  },
  "cached": false,
  "cost": {
    "total": {"costUsd": 0.0456}
  },
  "performance": {
    "total_ms": 21912
  }
}

Contributing

See Development Guide for:

Code style and testing
Commit message conventions
Pull request process
Running tests and building

Performance

Typical Response Times:

Cached hit: <100ms
Fresh enrichment: ~25-45 seconds
With deep research: ~35-60 seconds

Token Usage:

Pass 1 (Perplexity): ~1,500-2,500 input, ~800-1,200 output
Pass 2 (GPT-4o-mini): ~3,000-5,000 input, ~500-800 output
Deep research: ~200-400 input/output per query

Roadmap

OpenAPI/Swagger documentation generation
Rate limiting on Vercel production
Standardized error response format
Request ID tracking for distributed tracing
API usage analytics dashboard
Batch enrichment endpoint
Webhook support for async enrichments

Support

Check GitHub Issues for known issues
See Development Guide for troubleshooting
Review Architecture for system details

License

MIT

Getting Started Checklist

Current Status: v0.1.0 (Active Development) Last Updated: January 2026

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
api		api
docs		docs
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
API.md		API.md
README.md		README.md
bulk-re-enrich.ts		bulk-re-enrich.ts
claude.md		claude.md
compare-search-apis.ts		compare-search-apis.ts
discover-tam-local.ts		discover-tam-local.ts
package-lock.json		package-lock.json
package.json		package.json
test-model.ts		test-model.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benriched - Company Enrichment API

Overview

Key Features

Quick Start

Prerequisites

Installation

Run Locally

Test Endpoint

API Endpoints

Current (v1) Endpoints

Legacy Endpoints (Supported Indefinitely)

Documentation

Tech Stack

Architecture

Request Flow

Dual-AI Architecture

Cost Optimization

Environment Variables

Deployment

Deploy to Vercel

Production Testing

Response Example

Contributing

Performance

Roadmap

Support

License

Getting Started Checklist

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benriched - Company Enrichment API

Overview

Key Features

Quick Start

Prerequisites

Installation

Run Locally

Test Endpoint

API Endpoints

Current (v1) Endpoints

Legacy Endpoints (Supported Indefinitely)

Documentation

Tech Stack

Architecture

Request Flow

Dual-AI Architecture

Cost Optimization

Environment Variables

Deployment

Deploy to Vercel

Production Testing

Response Example

Contributing

Performance

Roadmap

Support

License

Getting Started Checklist

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages