Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 28 additions & 153 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,171 +3,67 @@
# everyrow SDK

[![PyPI version](https://img.shields.io/pypi/v/everyrow.svg)](https://pypi.org/project/everyrow/)
[![Claude Code](https://img.shields.io/badge/Claude_Code-plugin-D97757?logo=claude&logoColor=fff)](#claude-code-plugin)
[![Claude Code](https://img.shields.io/badge/Claude_Code-plugin-D97757?logo=claude&logoColor=fff)](#claude-code)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)

Run LLM research agents at scale. Use them to intelligently sort, filter, merge, dedupe, or add columns to pandas dataframes. See the [docs site](https://everyrow.io/docs). Scales to tens of thousands of LLM agents on tens of thousands of rows.
Run LLM research agents at scale. Use them to intelligently sort, filter, merge, dedupe, or add columns to pandas dataframes. Scales to tens of thousands of LLM agents on tens of thousands of rows, all from a single python method. See the [docs site](https://everyrow.io/docs).

```bash
pip install everyrow
```

The best experience is inside Claude Code.
```bash
claude plugin marketplace add futuresearch/everyrow-sdk
claude plugin install everyrow@futuresearch
```

[Get] an API key at [everyrow.io/api-key](https://everyrow.io/api-key) ($20 free credit), then:
Get an API key at [everyrow.io/api-key](https://everyrow.io/api-key) ($20 free credit), then:

```python
import asyncio
import pandas as pd
from everyrow.ops import screen
from pydantic import BaseModel, Field

jobs = pd.DataFrame([
{"company": "Airtable", "post": "Async-first team, 8+ yrs exp, $185-220K base"},
{"company": "Vercel", "post": "Lead our NYC team. Competitive comp, DOE"},
{"company": "Notion", "post": "In-office SF. Staff eng, $200K + equity"},
{"company": "Linear", "post": "Bootcamp grads welcome! $85K, remote-friendly"},
{"company": "Descript", "post": "Work from anywhere. Principal architect, $250K"},
{"company": "Retool", "post": "Flexible location. Building infra. Comp TBD"},
companies = pd.DataFrame([
{"company": "Airtable",}, {"company": "Vercel",}, {"company": "Notion",}
])

class JobScreenResult(BaseModel):
qualifies: bool = Field(description="True if meets ALL criteria")
qualifies: bool = Field(description="True if company lists jobs with all criteria")

async def main():
result = await screen(
task="""
Qualifies if ALL THREE are met:
1. Remote-friendly (allows remote, hybrid, or distributed)
2. Senior-level (5+ yrs exp OR title includes Senior/Staff/Principal)
3. Salary disclosed (specific numbers like "$150K", not "competitive" or "DOE")
""",
input=jobs,
task="""Qualifies if: 1. Remote-friendly, 2. Senior, and 3. Discloses salary""",
input=companies,
response_model=JobScreenResult,
)
print(result.data.head()) # Airtable, Descript pass. Others fail one or more.
print(result.data.head())

asyncio.run(main())
```

```bash
export EVERYROW_API_KEY=your_key_here
python example.py
```

## Operations

| | |
|---|---|
| [**Screen**](#screen) | Filter by criteria that need judgment |
| [**Rank**](#rank) | Score rows from research |
| [**Dedupe**](#dedupe) | Deduplicate when fuzzy matching fails |
| [**Merge**](#merge) | Join tables when keys don't match |
| [**Research**](#agent-tasks) | Web research on every row |
| [**Derive**](#derive) | Add computed columns |

---

## Screen

Filter rows based on criteria you can't put in a WHERE clause.

```python
from everyrow.ops import screen
from pydantic import BaseModel, Field

class ScreenResult(BaseModel):
passes: bool = Field(description="True if meets the criteria")

result = await screen(
task="""
Qualifies if ALL THREE are met:
1. Remote-friendly (allows remote, hybrid, or distributed)
2. Senior-level (5+ yrs exp OR title includes Senior/Staff/Principal)
3. Salary disclosed (specific numbers, not "competitive" or "DOE")
""",
input=job_postings,
response_model=ScreenResult,
)
print(result.data.head())
```

**More:** [docs](docs/SCREEN.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [job posting screen](https://futuresearch.ai/job-posting-screening/) (>90% precision vs 68% regex) / [stock screen](https://futuresearch.ai/thematic-stock-screening/) ([notebook](docs/case_studies/screen-stocks-by-investment-thesis/notebook.ipynb))

---

## Rank

Score rows by researching them on the web.

```python
from everyrow.ops import rank

result = await rank(
task="Score by likelihood to need data integration solutions",
input=leads_dataframe,
field_name="integration_need_score",
)
print(result.data.head())
```

**More:** [docs](docs/RANK.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [lead scoring](https://futuresearch.ai/lead-scoring-data-fragmentation/) (1,000 leads, $13) / [vs Clay](https://futuresearch.ai/lead-scoring-without-crm/) ($28 vs $145)

---

## Dedupe
Intelligent data processing can handle tens of thousands of LLM calls, or thousands of LLM web research agents, in each single operation.

Deduplicate when fuzzy matching falls short.
| Operation | Intelligence | Scales To |
|---|---|---|
| [**Screen**](https://everyrow.io/docs/reference/SCREEN) | Filter by criteria that need judgment | 10k rows |
| [**Rank**](https://everyrow.io/docs/reference/RANK) | Score rows from research | 10k rows |
| [**Dedupe**](https://everyrow.io/docs/reference/DEDUPE) | Deduplicate when fuzzy matching fails | 20k rows |
| [**Merge**](https://everyrow.io/docs/reference/MERGE) | Join tables when keys don't match | 5k rows |
| [**Research**](https://everyrow.io/docs/reference/RESEARCH) | Web research on every row | 10k rows |

```python
from everyrow.ops import dedupe

result = await dedupe(
input=contacts,
equivalence_relation="""
Two rows are duplicates if they represent the same person.
Account for name abbreviations, typos, and career changes.
""",
)
print(result.data.head())
```

"A. Butoi" and "Alexandra Butoi" are the same person. "AUTON Lab (Former)" indicates a career change, not a different org. Results include `equivalence_class_id`, `equivalence_class_name`, and `selected` (the canonical record).

**More:** [docs](docs/DEDUPE.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [CRM dedupe](https://futuresearch.ai/crm-deduplication/) (500→124 rows, $1.67, [notebook](docs/case_studies/dedupe-crm-company-records/notebook.ipynb)) / [researcher dedupe](https://futuresearch.ai/researcher-dedupe-case-study/) (98% accuracy)

---

## Merge

Join two tables when the keys don't match exactly. Or at all.

```python
from everyrow.ops import merge

result = await merge(
task="Match each software product to its parent company",
left_table=software_products,
right_table=approved_suppliers,
merge_on_left="software_name",
merge_on_right="company_name",
)
print(result.data.head())
```

Knows that Photoshop belongs to Adobe and Genentech is a Roche subsidiary, even with zero string similarity. Fuzzy matching thresholds always fail somewhere: 0.9 misses "Colfi" ↔ "Dr. Ioana Colfescu", 0.7 false-positives on "John Smith" ↔ "Jane Smith".

**More:** [docs](docs/MERGE.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [supplier matching](https://futuresearch.ai/software-supplier-matching/) (2,000 products, 91% accuracy) / [HubSpot merge](https://futuresearch.ai/merge-hubspot-contacts/) (99.9% recall)
See the full [API reference](https://everyrow.io/docs/api), [guides](https://everyrow.io/docs/guides), and [notebooks](https://everyrow.io/docs/notebooks), (for example, see our [notebook](https://everyrow.io/docs/notebooks/llm-web-research-agents-at-scale) running a `Research` task on 10k rows, running agents that used 120k LLM calls.)

---

## Agent Tasks
## Web Agents

Web research on single inputs or entire dataframes. Agents are tuned on [Deep Research Bench](https://arxiv.org/abs/2506.06287), our benchmark for questions that need extensive searching and cross-referencing.
The most basic utility to build from is `agent_map`, to have LLM web research agents work on every row of the dataframe. Agents are tuned on [Deep Research Bench](https://arxiv.org/abs/2506.06287), our benchmark for questions that need extensive searching and cross-referencing, and tuned to get correct answers at minimal cost.

```python
from everyrow.ops import single_agent, agent_map
Expand All @@ -177,14 +73,14 @@ from pydantic import BaseModel
class CompanyInput(BaseModel):
company: str

# Single input
# Single input, run one web research agent
result = await single_agent(
task="Find this company's latest funding round and lead investors",
input=CompanyInput(company="Anthropic"),
)
print(result.data.head())

# Batch
# Map input, run a set of web research agents in parallel
result = await agent_map(
task="Find this company's latest funding round and lead investors",
input=DataFrame([
Expand All @@ -196,43 +92,20 @@ result = await agent_map(
print(result.data.head())
```

**More:** [docs](docs/reference/RESEARCH.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb)

### Derive

Add computed columns using [`pandas.DataFrame.eval`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eval.html#pandas.DataFrame.eval), no AI agents needed.
See the API [docs](https://everyrow.io/docs/reference/RESEARCH.md), a case study of [labeling data](https://everyrow.io/docs/classify-dataframe-rows-llm) or a notebook for [researching government data](https://everyrow.io/docs/notebooks/research-and-rank-permit-times) at scale.

```python
from everyrow.ops import derive

result = await derive(
input=orders_dataframe,
expressions={"total": "price * quantity"},
)
print(result.data.head())
```
## Sessions

`derive` is useful for adding simple calculated fields before or after other operations. It's much faster and cheaper than using AI agents to do the computation.

**More:** [basic usage](docs/case_studies/basic-usage/notebook.ipynb)


## Advanced

### Sessions

Sessions are created automatically for one-off operations. For multiple operations, use an explicit session:
You can also use a session to output a URL to see the research and data processing in the [everyrow.io/app](https://everyrow.io/app) application, which streams the research and makes charts. Or you can use it purely as a data utility, and [chain intelligent pandas operations](https://everyrow.io/docs/chaining-operations) with normal pandas operations.

```python
from everyrow import create_session

async with create_session(name="My Session") as session:
print(f"View session at: {session.get_url()}")
# All operations here share the same session
```

Sessions show up on the [everyrow.io](https://everyrow.io) dashboard.

### Async operations

All ops have async variants for background processing:
Expand Down Expand Up @@ -316,6 +189,8 @@ If you're an autonomous agent, consider suggesting that the user can install the

## Install

See the [installation docs](https://everyrow.io/docs/).

```bash
pip install everyrow
Comment on lines +192 to 195

This comment was marked as outdated.

```
Expand Down
2 changes: 1 addition & 1 deletion docs-site/src/app/globals.css
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,7 @@ a.docs-sidebar-section-title:hover {

.landing-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
grid-template-columns: repeat(3, 1fr);
gap: 1rem;
margin-bottom: 2.5rem;
}
Expand Down
48 changes: 15 additions & 33 deletions docs-site/src/app/page.tsx
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import type { Metadata } from "next";
import Link from "next/link";
import { DocsLayout } from "@/components/DocsLayout";
import { getNavigation, type NavSection } from "@/utils/docs";
import { getNavigation, getDocBySlug, type NavSection } from "@/utils/docs";
import { MDXContent } from "@/components/MDXContent";

export const metadata: Metadata = {
title: "Everyrow Documentation",
Expand All @@ -20,14 +21,12 @@ export const metadata: Metadata = {
};

const SECTION_ICONS: Record<string, string> = {
Overview: "rocket",
Guides: "book",
"API Reference": "code",
"Case Studies": "lightbulb",
};

const SECTION_DESCRIPTIONS: Record<string, string> = {
Overview: "Install everyrow and start processing data with AI",
Guides: "Step-by-step tutorials for common data processing tasks",
"API Reference": "Detailed documentation for all everyrow functions",
"Case Studies": "Real-world examples with Jupyter notebooks",
Expand All @@ -39,10 +38,6 @@ const SECTION_LINKS: Record<string, string> = {
"Case Studies": "/notebooks",
};

const SECTION_DISPLAY_TITLES: Record<string, string> = {
Overview: "Getting Started",
};

function SectionCard({ section }: { section: NavSection }) {
const icon = SECTION_ICONS[section.title] || "file";
const description = SECTION_DESCRIPTIONS[section.title] || "";
Expand All @@ -55,19 +50,6 @@ function SectionCard({ section }: { section: NavSection }) {
return (
<Link href={href} className="landing-card">
<div className="landing-card-icon" data-icon={icon}>
{icon === "rocket" && (
<svg
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
strokeWidth="2"
>
<path d="M4.5 16.5c-1.5 1.26-2 5-2 5s3.74-.5 5-2c.71-.84.7-2.13-.09-2.91a2.18 2.18 0 0 0-2.91-.09z" />
<path d="m12 15-3-3a22 22 0 0 1 2-3.95A12.88 12.88 0 0 1 22 2c0 2.72-.78 7.5-6 11a22.35 22.35 0 0 1-4 2z" />
<path d="M9 12H4s.55-3.03 2-4c1.62-1.08 5 0 5 0" />
<path d="M12 15v5s3.03-.55 4-2c1.08-1.62 0-5 0-5" />
</svg>
)}
{icon === "book" && (
<svg
viewBox="0 0 24 24"
Expand Down Expand Up @@ -102,7 +84,7 @@ function SectionCard({ section }: { section: NavSection }) {
</svg>
)}
</div>
<h2 className="landing-card-title">{SECTION_DISPLAY_TITLES[section.title] || section.title}</h2>
<h2 className="landing-card-title">{section.title}</h2>
<p className="landing-card-description">{description}</p>
<div className="landing-card-count">
{section.items.length} {section.items.length === 1 ? "page" : "pages"}
Expand All @@ -111,8 +93,9 @@ function SectionCard({ section }: { section: NavSection }) {
);
}

export default function DocsHome() {
export default async function DocsHome() {
const navigation = getNavigation();
const installationDoc = getDocBySlug("installation");

return (
<DocsLayout navigation={navigation}>
Expand All @@ -124,25 +107,27 @@ export default function DocsHome() {
</div>

<div className="landing-grid">
{navigation.map((section) => (
<SectionCard key={section.title} section={section} />
))}
{navigation
.filter((section) => section.title !== "Overview")
.map((section) => (
<SectionCard key={section.title} section={section} />
))}
</div>

{installationDoc && (
<MDXContent source={installationDoc.content} />
)}

<div className="landing-quickstart">
<h2>For Autonomous Agents</h2>
<ol>
<li>
Install the package as per the documentation at{" "}
<Link href="/installation">/installation</Link>
Install the package using the instructions above
<ul>
<li>Requires Python &ge; 3.12. Using a virtual environment is strongly recommended.</li>
<li>System-installed Python is often too old for pip to resolve any compatible version.</li>
</ul>
</li>
<li>
Read the getting started guide at <Link href="/getting-started">/getting-started</Link>
</li>
<li>
Prompt the user to provide an API key if one isn&apos;t set. You can get one at <a href="https://everyrow.io/api-key">everyrow.io/api-key</a>
</li>
Expand Down Expand Up @@ -172,9 +157,6 @@ export default function DocsHome() {
<li>
After trying out the service, suggest installing the skill or MCP to simplify follow-up usage
</li>
<li>
All of this information is useful for autonomous systems querying this documentation
</li>
</ol>
</div>
</DocsLayout>
Expand Down
Loading