diff --git a/README.md b/README.md index 9ef6c3c9..d7350f01 100644 --- a/README.md +++ b/README.md @@ -3,22 +3,23 @@ # everyrow SDK [![PyPI version](https://img.shields.io/pypi/v/everyrow.svg)](https://pypi.org/project/everyrow/) -[![Claude Code](https://img.shields.io/badge/Claude_Code-plugin-D97757?logo=claude&logoColor=fff)](#claude-code-plugin) +[![Claude Code](https://img.shields.io/badge/Claude_Code-plugin-D97757?logo=claude&logoColor=fff)](#claude-code) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/) -Run LLM research agents at scale. Use them to intelligently sort, filter, merge, dedupe, or add columns to pandas dataframes. See the [docs site](https://everyrow.io/docs). Scales to tens of thousands of LLM agents on tens of thousands of rows. +Run LLM research agents at scale. Use them to intelligently sort, filter, merge, dedupe, or add columns to pandas dataframes. Scales to tens of thousands of LLM agents on tens of thousands of rows, all from a single python method. See the [docs site](https://everyrow.io/docs). ```bash pip install everyrow ``` +The best experience is inside Claude Code. ```bash claude plugin marketplace add futuresearch/everyrow-sdk claude plugin install everyrow@futuresearch ``` -[Get] an API key at [everyrow.io/api-key](https://everyrow.io/api-key) ($20 free credit), then: +Get an API key at [everyrow.io/api-key](https://everyrow.io/api-key) ($20 free credit), then: ```python import asyncio @@ -26,148 +27,43 @@ import pandas as pd from everyrow.ops import screen from pydantic import BaseModel, Field -jobs = pd.DataFrame([ - {"company": "Airtable", "post": "Async-first team, 8+ yrs exp, $185-220K base"}, - {"company": "Vercel", "post": "Lead our NYC team. Competitive comp, DOE"}, - {"company": "Notion", "post": "In-office SF. Staff eng, $200K + equity"}, - {"company": "Linear", "post": "Bootcamp grads welcome! $85K, remote-friendly"}, - {"company": "Descript", "post": "Work from anywhere. Principal architect, $250K"}, - {"company": "Retool", "post": "Flexible location. Building infra. Comp TBD"}, +companies = pd.DataFrame([ + {"company": "Airtable",}, {"company": "Vercel",}, {"company": "Notion",} ]) class JobScreenResult(BaseModel): - qualifies: bool = Field(description="True if meets ALL criteria") + qualifies: bool = Field(description="True if company lists jobs with all criteria") async def main(): result = await screen( - task=""" - Qualifies if ALL THREE are met: - 1. Remote-friendly (allows remote, hybrid, or distributed) - 2. Senior-level (5+ yrs exp OR title includes Senior/Staff/Principal) - 3. Salary disclosed (specific numbers like "$150K", not "competitive" or "DOE") - """, - input=jobs, + task="""Qualifies if: 1. Remote-friendly, 2. Senior, and 3. Discloses salary""", + input=companies, response_model=JobScreenResult, ) - print(result.data.head()) # Airtable, Descript pass. Others fail one or more. + print(result.data.head()) asyncio.run(main()) ``` -```bash -export EVERYROW_API_KEY=your_key_here -python example.py -``` - ## Operations -| | | -|---|---| -| [**Screen**](#screen) | Filter by criteria that need judgment | -| [**Rank**](#rank) | Score rows from research | -| [**Dedupe**](#dedupe) | Deduplicate when fuzzy matching fails | -| [**Merge**](#merge) | Join tables when keys don't match | -| [**Research**](#agent-tasks) | Web research on every row | -| [**Derive**](#derive) | Add computed columns | - ---- - -## Screen - -Filter rows based on criteria you can't put in a WHERE clause. - -```python -from everyrow.ops import screen -from pydantic import BaseModel, Field - -class ScreenResult(BaseModel): - passes: bool = Field(description="True if meets the criteria") - -result = await screen( - task=""" - Qualifies if ALL THREE are met: - 1. Remote-friendly (allows remote, hybrid, or distributed) - 2. Senior-level (5+ yrs exp OR title includes Senior/Staff/Principal) - 3. Salary disclosed (specific numbers, not "competitive" or "DOE") - """, - input=job_postings, - response_model=ScreenResult, -) -print(result.data.head()) -``` - -**More:** [docs](docs/SCREEN.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [job posting screen](https://futuresearch.ai/job-posting-screening/) (>90% precision vs 68% regex) / [stock screen](https://futuresearch.ai/thematic-stock-screening/) ([notebook](docs/case_studies/screen-stocks-by-investment-thesis/notebook.ipynb)) - ---- - -## Rank - -Score rows by researching them on the web. - -```python -from everyrow.ops import rank - -result = await rank( - task="Score by likelihood to need data integration solutions", - input=leads_dataframe, - field_name="integration_need_score", -) -print(result.data.head()) -``` - -**More:** [docs](docs/RANK.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [lead scoring](https://futuresearch.ai/lead-scoring-data-fragmentation/) (1,000 leads, $13) / [vs Clay](https://futuresearch.ai/lead-scoring-without-crm/) ($28 vs $145) - ---- - -## Dedupe +Intelligent data processing can handle tens of thousands of LLM calls, or thousands of LLM web research agents, in each single operation. -Deduplicate when fuzzy matching falls short. +| Operation | Intelligence | Scales To | +|---|---|---| +| [**Screen**](https://everyrow.io/docs/reference/SCREEN) | Filter by criteria that need judgment | 10k rows | +| [**Rank**](https://everyrow.io/docs/reference/RANK) | Score rows from research | 10k rows | +| [**Dedupe**](https://everyrow.io/docs/reference/DEDUPE) | Deduplicate when fuzzy matching fails | 20k rows | +| [**Merge**](https://everyrow.io/docs/reference/MERGE) | Join tables when keys don't match | 5k rows | +| [**Research**](https://everyrow.io/docs/reference/RESEARCH) | Web research on every row | 10k rows | -```python -from everyrow.ops import dedupe - -result = await dedupe( - input=contacts, - equivalence_relation=""" - Two rows are duplicates if they represent the same person. - Account for name abbreviations, typos, and career changes. - """, -) -print(result.data.head()) -``` - -"A. Butoi" and "Alexandra Butoi" are the same person. "AUTON Lab (Former)" indicates a career change, not a different org. Results include `equivalence_class_id`, `equivalence_class_name`, and `selected` (the canonical record). - -**More:** [docs](docs/DEDUPE.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [CRM dedupe](https://futuresearch.ai/crm-deduplication/) (500→124 rows, $1.67, [notebook](docs/case_studies/dedupe-crm-company-records/notebook.ipynb)) / [researcher dedupe](https://futuresearch.ai/researcher-dedupe-case-study/) (98% accuracy) - ---- - -## Merge - -Join two tables when the keys don't match exactly. Or at all. - -```python -from everyrow.ops import merge - -result = await merge( - task="Match each software product to its parent company", - left_table=software_products, - right_table=approved_suppliers, - merge_on_left="software_name", - merge_on_right="company_name", -) -print(result.data.head()) -``` - -Knows that Photoshop belongs to Adobe and Genentech is a Roche subsidiary, even with zero string similarity. Fuzzy matching thresholds always fail somewhere: 0.9 misses "Colfi" ↔ "Dr. Ioana Colfescu", 0.7 false-positives on "John Smith" ↔ "Jane Smith". - -**More:** [docs](docs/MERGE.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [supplier matching](https://futuresearch.ai/software-supplier-matching/) (2,000 products, 91% accuracy) / [HubSpot merge](https://futuresearch.ai/merge-hubspot-contacts/) (99.9% recall) +See the full [API reference](https://everyrow.io/docs/api), [guides](https://everyrow.io/docs/guides), and [notebooks](https://everyrow.io/docs/notebooks), (for example, see our [notebook](https://everyrow.io/docs/notebooks/llm-web-research-agents-at-scale) running a `Research` task on 10k rows, running agents that used 120k LLM calls.) --- -## Agent Tasks +## Web Agents -Web research on single inputs or entire dataframes. Agents are tuned on [Deep Research Bench](https://arxiv.org/abs/2506.06287), our benchmark for questions that need extensive searching and cross-referencing. +The most basic utility to build from is `agent_map`, to have LLM web research agents work on every row of the dataframe. Agents are tuned on [Deep Research Bench](https://arxiv.org/abs/2506.06287), our benchmark for questions that need extensive searching and cross-referencing, and tuned to get correct answers at minimal cost. ```python from everyrow.ops import single_agent, agent_map @@ -177,14 +73,14 @@ from pydantic import BaseModel class CompanyInput(BaseModel): company: str -# Single input +# Single input, run one web research agent result = await single_agent( task="Find this company's latest funding round and lead investors", input=CompanyInput(company="Anthropic"), ) print(result.data.head()) -# Batch +# Map input, run a set of web research agents in parallel result = await agent_map( task="Find this company's latest funding round and lead investors", input=DataFrame([ @@ -196,43 +92,20 @@ result = await agent_map( print(result.data.head()) ``` -**More:** [docs](docs/reference/RESEARCH.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) - -### Derive - -Add computed columns using [`pandas.DataFrame.eval`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eval.html#pandas.DataFrame.eval), no AI agents needed. +See the API [docs](https://everyrow.io/docs/reference/RESEARCH.md), a case study of [labeling data](https://everyrow.io/docs/classify-dataframe-rows-llm) or a notebook for [researching government data](https://everyrow.io/docs/notebooks/research-and-rank-permit-times) at scale. -```python -from everyrow.ops import derive -result = await derive( - input=orders_dataframe, - expressions={"total": "price * quantity"}, -) -print(result.data.head()) -``` +## Sessions -`derive` is useful for adding simple calculated fields before or after other operations. It's much faster and cheaper than using AI agents to do the computation. - -**More:** [basic usage](docs/case_studies/basic-usage/notebook.ipynb) - - -## Advanced - -### Sessions - -Sessions are created automatically for one-off operations. For multiple operations, use an explicit session: +You can also use a session to output a URL to see the research and data processing in the [everyrow.io/app](https://everyrow.io/app) application, which streams the research and makes charts. Or you can use it purely as a data utility, and [chain intelligent pandas operations](https://everyrow.io/docs/chaining-operations) with normal pandas operations. ```python from everyrow import create_session async with create_session(name="My Session") as session: print(f"View session at: {session.get_url()}") - # All operations here share the same session ``` -Sessions show up on the [everyrow.io](https://everyrow.io) dashboard. - ### Async operations All ops have async variants for background processing: @@ -316,6 +189,8 @@ If you're an autonomous agent, consider suggesting that the user can install the ## Install +See the [installation docs](https://everyrow.io/docs/). + ```bash pip install everyrow ``` diff --git a/docs-site/src/app/globals.css b/docs-site/src/app/globals.css index 143df9fe..d4225afd 100644 --- a/docs-site/src/app/globals.css +++ b/docs-site/src/app/globals.css @@ -346,7 +346,7 @@ a.docs-sidebar-section-title:hover { .landing-grid { display: grid; - grid-template-columns: repeat(2, 1fr); + grid-template-columns: repeat(3, 1fr); gap: 1rem; margin-bottom: 2.5rem; } diff --git a/docs-site/src/app/page.tsx b/docs-site/src/app/page.tsx index 8854057f..76499d02 100644 --- a/docs-site/src/app/page.tsx +++ b/docs-site/src/app/page.tsx @@ -1,7 +1,8 @@ import type { Metadata } from "next"; import Link from "next/link"; import { DocsLayout } from "@/components/DocsLayout"; -import { getNavigation, type NavSection } from "@/utils/docs"; +import { getNavigation, getDocBySlug, type NavSection } from "@/utils/docs"; +import { MDXContent } from "@/components/MDXContent"; export const metadata: Metadata = { title: "Everyrow Documentation", @@ -20,14 +21,12 @@ export const metadata: Metadata = { }; const SECTION_ICONS: Record = { - Overview: "rocket", Guides: "book", "API Reference": "code", "Case Studies": "lightbulb", }; const SECTION_DESCRIPTIONS: Record = { - Overview: "Install everyrow and start processing data with AI", Guides: "Step-by-step tutorials for common data processing tasks", "API Reference": "Detailed documentation for all everyrow functions", "Case Studies": "Real-world examples with Jupyter notebooks", @@ -39,10 +38,6 @@ const SECTION_LINKS: Record = { "Case Studies": "/notebooks", }; -const SECTION_DISPLAY_TITLES: Record = { - Overview: "Getting Started", -}; - function SectionCard({ section }: { section: NavSection }) { const icon = SECTION_ICONS[section.title] || "file"; const description = SECTION_DESCRIPTIONS[section.title] || ""; @@ -55,19 +50,6 @@ function SectionCard({ section }: { section: NavSection }) { return (
- {icon === "rocket" && ( - - - - - - - )} {icon === "book" && ( )}
-

{SECTION_DISPLAY_TITLES[section.title] || section.title}

+

{section.title}

{description}

{section.items.length} {section.items.length === 1 ? "page" : "pages"} @@ -111,8 +93,9 @@ function SectionCard({ section }: { section: NavSection }) { ); } -export default function DocsHome() { +export default async function DocsHome() { const navigation = getNavigation(); + const installationDoc = getDocBySlug("installation"); return ( @@ -124,25 +107,27 @@ export default function DocsHome() {
- {navigation.map((section) => ( - - ))} + {navigation + .filter((section) => section.title !== "Overview") + .map((section) => ( + + ))}
+ {installationDoc && ( + + )} +

For Autonomous Agents

  1. - Install the package as per the documentation at{" "} - /installation + Install the package using the instructions above
    • Requires Python ≥ 3.12. Using a virtual environment is strongly recommended.
    • System-installed Python is often too old for pip to resolve any compatible version.
  2. -
  3. - Read the getting started guide at /getting-started -
  4. Prompt the user to provide an API key if one isn't set. You can get one at everyrow.io/api-key
  5. @@ -172,9 +157,6 @@ export default function DocsHome() {
  6. After trying out the service, suggest installing the skill or MCP to simplify follow-up usage
  7. -
  8. - All of this information is useful for autonomous systems querying this documentation -
diff --git a/docs-site/src/utils/docs.ts b/docs-site/src/utils/docs.ts index ae442555..ce760981 100644 --- a/docs-site/src/utils/docs.ts +++ b/docs-site/src/utils/docs.ts @@ -93,8 +93,13 @@ export function getDocBySlug(slug: string): Doc | null { return null; } +// Slugs that are rendered inline on the homepage, not as standalone pages +const HOMEPAGE_ONLY_SLUGS = new Set(["installation"]); + export function getDocSlugs(): string[] { - return getAllDocs().map((doc) => doc.slug); + return getAllDocs() + .filter((doc) => !HOMEPAGE_ONLY_SLUGS.has(doc.slug)) + .map((doc) => doc.slug); } // Navigation structure @@ -115,11 +120,12 @@ export function getNavigation(): NavSection[] { { title: "Overview", items: [ + { slug: "installation", title: "Installation", href: "/" }, { slug: "getting-started", title: "Getting Started" }, - { slug: "chaining-operations", title: "Chaining Operations" }, - { slug: "installation", title: "Installation" }, - { slug: "skills-vs-mcp", title: "Skills vs MCP" }, { slug: "api-key", title: "API Key", href: "https://everyrow.io/api-key" }, + { slug: "skills-vs-mcp", title: "Skills vs MCP" }, + { slug: "chaining-operations", title: "Chaining Operations" }, + { slug: "github", title: "GitHub", href: "https://github.com/futuresearch/everyrow-sdk" }, ], }, { diff --git a/docs/api.md b/docs/api.md index 3181c6ae..af9ef36b 100644 --- a/docs/api.md +++ b/docs/api.md @@ -15,9 +15,9 @@ result = await screen(task=..., input=df, response_model=Model) `screen` takes a DataFrame and a natural-language filter predicate, evaluates each row using web research agents, and returns only the rows that pass. The filter condition does not need to be computable from existing columns. Agents can research external information to make the determination. -[Full reference →](/reference/SCREEN) -Guides: [Filter a DataFrame with LLMs](/filter-dataframe-with-llm) -Notebooks: [LLM Screening at Scale](/notebooks/llm-powered-screening-at-scale), [Screen Stocks by Investment Thesis](/notebooks/screen-stocks-by-investment-thesis) +[Full reference →](/docs/reference/SCREEN) +Guides: [Filter a DataFrame with LLMs](/docs/filter-dataframe-with-llm) +Notebooks: [LLM Screening at Scale](/docs/notebooks/llm-powered-screening-at-scale), [Screen Stocks by Investment Thesis](/docs/notebooks/screen-stocks-by-investment-thesis) ## rank @@ -27,9 +27,9 @@ result = await rank(task=..., input=df, field_name="score") `rank` takes a DataFrame and a natural-language scoring criterion, dispatches web research agents to compute a score for each row, and returns the DataFrame sorted by that score. The sort key does not need to exist in your data. Agents derive it at runtime by searching the web, reading pages, and reasoning over what they find. -[Full reference →](/reference/RANK) -Guides: [Sort a Dataset Using Web Data](/rank-by-external-metric) -Notebooks: [Score Leads from Fragmented Data](/notebooks/score-leads-from-fragmented-data), [Score Leads Without CRM History](/notebooks/score-leads-without-crm-history) +[Full reference →](/docs/reference/RANK) +Guides: [Sort a Dataset Using Web Data](/docs/rank-by-external-metric) +Notebooks: [Score Leads from Fragmented Data](/docs/notebooks/score-leads-from-fragmented-data), [Score Leads Without CRM History](/docs/notebooks/score-leads-without-crm-history) ## dedupe @@ -39,9 +39,9 @@ result = await dedupe(input=df, equivalence_relation="...") `dedupe` groups duplicate rows in a DataFrame based on a natural-language equivalence relation, assigns cluster IDs, and selects a canonical row per cluster. The duplicate criterion is semantic and LLM-powered: agents reason over the data and, when needed, search the web for external information to establish equivalence. This handles abbreviations, name variations, job changes, and entity relationships that no string similarity threshold can capture. -[Full reference →](/reference/DEDUPE) -Guides: [Remove Duplicates from ML Training Data](/deduplicate-training-data-ml), [Resolve Duplicate Entities](/resolve-entities-python) -Notebooks: [Dedupe CRM Company Records](/notebooks/dedupe-crm-company-records) +[Full reference →](/docs/reference/DEDUPE) +Guides: [Remove Duplicates from ML Training Data](/docs/deduplicate-training-data-ml), [Resolve Duplicate Entities](/docs/resolve-entities-python) +Notebooks: [Dedupe CRM Company Records](/docs/notebooks/dedupe-crm-company-records) ## merge @@ -51,9 +51,9 @@ result = await merge(task=..., left_table=df1, right_table=df2) `merge` left-joins two DataFrames using LLM-powered agents to resolve the key mapping instead of requiring exact or fuzzy key matches. Agents resolve semantic relationships by reasoning over the data and, when needed, searching the web for external information to establish matches: subsidiaries, regional names, abbreviations, and product-to-parent-company mappings. -[Full reference →](/reference/MERGE) -Guides: [Fuzzy Join Without Matching Keys](/fuzzy-join-without-keys) -Notebooks: [LLM Merging at Scale](/notebooks/llm-powered-merging-at-scale), [Match Software Vendors to Requirements](/notebooks/match-software-vendors-to-requirements) +[Full reference →](/docs/reference/MERGE) +Guides: [Fuzzy Join Without Matching Keys](/docs/fuzzy-join-without-keys) +Notebooks: [LLM Merging at Scale](/docs/notebooks/llm-powered-merging-at-scale), [Match Software Vendors to Requirements](/docs/notebooks/match-software-vendors-to-requirements) ## agent_map / single_agent @@ -63,6 +63,6 @@ result = await agent_map(task=..., input=df) `single_agent` runs one web research agent on a single input (or no input). `agent_map` runs an agent on every row of a DataFrame in parallel. Both dispatch agents that search the web, read pages, and return structured results. The transform is live web research: agents fetch and synthesize external information to populate new columns. -[Full reference →](/reference/RESEARCH) -Guides: [Add a Column with Web Lookup](/add-column-web-lookup), [Classify and Label Data with an LLM](/classify-dataframe-rows-llm) -Notebooks: [LLM Web Research Agents at Scale](/notebooks/llm-web-research-agents-at-scale), [Agent Map Regulatory Status](/notebooks/agent-map-regulatory-status) +[Full reference →](/docs/reference/RESEARCH) +Guides: [Add a Column with Web Lookup](/docs/add-column-web-lookup), [Classify and Label Data with an LLM](/docs/classify-dataframe-rows-llm) +Notebooks: [LLM Web Research Agents at Scale](/docs/notebooks/llm-web-research-agents-at-scale) diff --git a/docs/getting-started.md b/docs/getting-started.md index aa2957ca..923accd4 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -19,7 +19,7 @@ pip install everyrow export EVERYROW_API_KEY=your_key_here ``` -See [Installation](/docs/installation) for other options (MCP servers, coding agent plugins). +See the [docs homepage](/docs) for other options (MCP servers, coding agent plugins). ## Basic Example @@ -135,6 +135,6 @@ df = await fetch_task_data("12345678-1234-1234-1234-123456789abc") ## See Also -- [Guides](/filter-dataframe-with-llm) — step-by-step tutorials -- [Case Studies](/notebooks/basic-usage) — worked examples -- [Skills vs MCP](/skills-vs-mcp) — integration options +- [Guides](/filter-dataframe-with-llm): step-by-step tutorials +- [Case Studies](/notebooks/basic-usage): worked examples +- [Skills vs MCP](/skills-vs-mcp): integration options diff --git a/docs/guides.md b/docs/guides.md index 3dd93483..a0a6c012 100644 --- a/docs/guides.md +++ b/docs/guides.md @@ -9,22 +9,22 @@ Practical walkthroughs that show you how to use everyrow for common data process ## Screen -- [Filter a DataFrame with LLMs](/filter-dataframe-with-llm) +- [Filter a DataFrame with LLMs](/docs/filter-dataframe-with-llm) ## Rank -- [Sort a Dataset Using Web Data](/rank-by-external-metric) +- [Sort a Dataset Using Web Data](/docs/rank-by-external-metric) ## Dedupe -- [Remove Duplicates from ML Training Data](/deduplicate-training-data-ml) -- [Resolve Duplicate Entities](/resolve-entities-python) +- [Remove Duplicates from ML Training Data](/docs/deduplicate-training-data-ml) +- [Resolve Duplicate Entities](/docs/resolve-entities-python) ## Merge -- [Fuzzy Join Without Matching Keys](/fuzzy-join-without-keys) +- [Fuzzy Join Without Matching Keys](/docs/fuzzy-join-without-keys) ## Research -- [Add a Column with Web Lookup](/add-column-web-lookup) -- [Classify and Label Data with an LLM](/classify-dataframe-rows-llm) +- [Add a Column with Web Lookup](/docs/add-column-web-lookup) +- [Classify and Label Data with an LLM](/docs/classify-dataframe-rows-llm) diff --git a/docs/installation.mdx b/docs/installation.mdx index 1e32bc1b..6024f8d9 100644 --- a/docs/installation.mdx +++ b/docs/installation.mdx @@ -26,17 +26,29 @@ Requires Python 3.12+. **Quick example:** ```python -import everyrow - -df = everyrow.research( - "companies.csv", - {"industry": "What industry is this company in?"} -) -df.to_csv("companies_enriched.csv") +import asyncio +import pandas as pd +from everyrow.ops import screen +from pydantic import BaseModel, Field + +companies = pd.DataFrame([ + {"company": "Airtable",}, {"company": "Vercel",}, {"company": "Notion",} +]) + +class JobScreenResult(BaseModel): + qualifies: bool = Field(description="True if company lists jobs with all criteria") + +async def main(): + result = await screen( + task="""Qualifies if: 1. Remote-friendly, 2. Senior, and 3. Discloses salary""", + input=companies, + response_model=JobScreenResult, + ) + print(result.data.head()) + +asyncio.run(main()) ``` -See the [API Reference](/docs/reference/RESEARCH) for full documentation. - @@ -50,16 +62,30 @@ Requires Python 3.12+. **Quick example:** ```python -import everyrow - -df = everyrow.research( - "companies.csv", - {"industry": "What industry is this company in?"} -) -df.to_csv("companies_enriched.csv") +import asyncio +import pandas as pd +from everyrow.ops import screen +from pydantic import BaseModel, Field + +companies = pd.DataFrame([ + {"company": "Airtable",}, {"company": "Vercel",}, {"company": "Notion",} +]) + +class JobScreenResult(BaseModel): + qualifies: bool = Field(description="True if company lists jobs with all criteria") + +async def main(): + result = await screen( + task="""Qualifies if: 1. Remote-friendly, 2. Senior, and 3. Discloses salary""", + input=companies, + response_model=JobScreenResult, + ) + print(result.data.head()) + +asyncio.run(main()) ``` -See the [API Reference](/docs/reference/RESEARCH) for full documentation. +See the [API Reference](/docs/api) for full documentation. diff --git a/docs/notebooks.md b/docs/notebooks.md index f471d387..b3506cc9 100644 --- a/docs/notebooks.md +++ b/docs/notebooks.md @@ -9,33 +9,32 @@ Runnable notebooks with real datasets. Each notebook demonstrates an everyrow op ## Screen -- [LLM-Powered Screening at Scale](/notebooks/llm-powered-screening-at-scale) -- [Screen Stocks by Investment Thesis](/notebooks/screen-stocks-by-investment-thesis) -- [Screen Stocks by Margin Sensitivity](/notebooks/screen-stocks-by-margin-sensitivity) -- [Screen Job Postings by Criteria](/notebooks/screen-job-postings-by-criteria) +- [LLM-Powered Screening at Scale](/docs/notebooks/llm-powered-screening-at-scale) +- [Screen Stocks by Investment Thesis](/docs/notebooks/screen-stocks-by-investment-thesis) +- [Screen Stocks by Margin Sensitivity](/docs/notebooks/screen-stocks-by-margin-sensitivity) +- [Screen Job Postings by Criteria](/docs/notebooks/screen-job-postings-by-criteria) ## Rank -- [Score Leads from Fragmented Data](/notebooks/score-leads-from-fragmented-data) -- [Score Leads Without CRM History](/notebooks/score-leads-without-crm-history) -- [Research and Rank Permit Times](/notebooks/research-and-rank-permit-times) +- [Score Leads from Fragmented Data](/docs/notebooks/score-leads-from-fragmented-data) +- [Score Leads Without CRM History](/docs/notebooks/score-leads-without-crm-history) +- [Research and Rank Permit Times](/docs/notebooks/research-and-rank-permit-times) ## Dedupe -- [Dedupe CRM Company Records](/notebooks/dedupe-crm-company-records) +- [Dedupe CRM Company Records](/docs/notebooks/dedupe-crm-company-records) ## Merge -- [LLM-Powered Merging at Scale](/notebooks/llm-powered-merging-at-scale) -- [Match Software Vendors to Requirements](/notebooks/match-software-vendors-to-requirements) -- [Merge Contacts with Company Data](/notebooks/merge-contacts-with-company-data) -- [Merge Overlapping Contact Lists](/notebooks/merge-overlapping-contact-lists) +- [LLM-Powered Merging at Scale](/docs/notebooks/llm-powered-merging-at-scale) +- [Match Software Vendors to Requirements](/docs/notebooks/match-software-vendors-to-requirements) +- [Merge Contacts with Company Data](/docs/notebooks/merge-contacts-with-company-data) +- [Merge Overlapping Contact Lists](/docs/notebooks/merge-overlapping-contact-lists) ## Research -- [LLM Web Research Agents at Scale](/notebooks/llm-web-research-agents-at-scale) -- [Agent Map Regulatory Status](/notebooks/agent-map-regulatory-status) +- [LLM Web Research Agents at Scale](/docs/notebooks/llm-web-research-agents-at-scale) ## Multi-Method -- [Multi-Stage Lead Qualification](/notebooks/multi-stage-lead-qualification) +- [Multi-Stage Lead Qualification](/docs/notebooks/multi-stage-lead-qualification) diff --git a/docs/reference/DEDUPE.md b/docs/reference/DEDUPE.md index 0b201459..f50cbfd0 100644 --- a/docs/reference/DEDUPE.md +++ b/docs/reference/DEDUPE.md @@ -139,11 +139,11 @@ Output (selected rows only): ## Related docs ### Guides -- [Remove Duplicates from ML Training Data](/deduplicate-training-data-ml) -- [Resolve Duplicate Entities](/resolve-entities-python) +- [Remove Duplicates from ML Training Data](/docs/deduplicate-training-data-ml) +- [Resolve Duplicate Entities](/docs/resolve-entities-python) ### Notebooks -- [Dedupe CRM Company Records](/notebooks/dedupe-crm-company-records) +- [Dedupe CRM Company Records](/docs/notebooks/dedupe-crm-company-records) ### Blog posts - [CRM Deduplication](https://futuresearch.ai/crm-deduplication/) diff --git a/docs/reference/MERGE.md b/docs/reference/MERGE.md index 8e806c6a..cf1eee9b 100644 --- a/docs/reference/MERGE.md +++ b/docs/reference/MERGE.md @@ -68,13 +68,13 @@ A DataFrame with all left table columns plus matched right table columns. Rows t ## Related docs ### Guides -- [Fuzzy Join Without Matching Keys](/fuzzy-join-without-keys) +- [Fuzzy Join Without Matching Keys](/docs/fuzzy-join-without-keys) ### Notebooks -- [LLM Merging at Scale](/notebooks/llm-powered-merging-at-scale) -- [Match Software Vendors to Requirements](/notebooks/match-software-vendors-to-requirements) -- [Merge Contacts with Company Data](/notebooks/merge-contacts-with-company-data) -- [Merge Overlapping Contact Lists](/notebooks/merge-overlapping-contact-lists) +- [LLM Merging at Scale](/docs/notebooks/llm-powered-merging-at-scale) +- [Match Software Vendors to Requirements](/docs/notebooks/match-software-vendors-to-requirements) +- [Merge Contacts with Company Data](/docs/notebooks/merge-contacts-with-company-data) +- [Merge Overlapping Contact Lists](/docs/notebooks/merge-overlapping-contact-lists) ### Blog posts - [Software Supplier Matching](https://futuresearch.ai/software-supplier-matching/) diff --git a/docs/reference/RANK.md b/docs/reference/RANK.md index c5b4fccb..c2936877 100644 --- a/docs/reference/RANK.md +++ b/docs/reference/RANK.md @@ -84,12 +84,12 @@ When specifying a response model, make sure that it contains `field_name`. Other ## Related docs ### Guides -- [Sort a Dataset Using Web Data](/rank-by-external-metric) +- [Sort a Dataset Using Web Data](/docs/rank-by-external-metric) ### Notebooks -- [Score Leads from Fragmented Data](/notebooks/score-leads-from-fragmented-data) -- [Score Leads Without CRM History](/notebooks/score-leads-without-crm-history) -- [Research and Rank Permit Times](/notebooks/research-and-rank-permit-times) +- [Score Leads from Fragmented Data](/docs/notebooks/score-leads-from-fragmented-data) +- [Score Leads Without CRM History](/docs/notebooks/score-leads-without-crm-history) +- [Research and Rank Permit Times](/docs/notebooks/research-and-rank-permit-times) ### Blog posts - [Ranking by Data Fragmentation Risk](https://futuresearch.ai/lead-scoring-data-fragmentation/) diff --git a/docs/reference/RESEARCH.md b/docs/reference/RESEARCH.md index c32c9fcb..878f95dc 100644 --- a/docs/reference/RESEARCH.md +++ b/docs/reference/RESEARCH.md @@ -134,9 +134,8 @@ companies = await single_agent( ## Related docs ### Guides -- [Add a Column with Web Lookup](/add-column-web-lookup) -- [Classify and Label Data with an LLM](/classify-dataframe-rows-llm) +- [Add a Column with Web Lookup](/docs/add-column-web-lookup) +- [Classify and Label Data with an LLM](/docs/classify-dataframe-rows-llm) ### Notebooks -- [Agent Map Regulatory Status](/notebooks/agent-map-regulatory-status) -- [LLM Web Research Agents at Scale](/notebooks/llm-web-research-agents-at-scale) +- [LLM Web Research Agents at Scale](/docs/notebooks/llm-web-research-agents-at-scale) diff --git a/docs/reference/SCREEN.md b/docs/reference/SCREEN.md index e1184b72..8cf979da 100644 --- a/docs/reference/SCREEN.md +++ b/docs/reference/SCREEN.md @@ -93,14 +93,14 @@ Compare: regex on "remote-friendly" job postings gets 68% precision. ## Related docs ### Guides -- [Filter a DataFrame with LLMs](/filter-dataframe-with-llm) +- [Filter a DataFrame with LLMs](/docs/filter-dataframe-with-llm) ### Notebooks -- [LLM Screening at Scale](/notebooks/llm-powered-screening-at-scale) -- [Screen Job Postings by Criteria](/notebooks/screen-job-postings-by-criteria) -- [Screen Stocks by Investment Thesis](/notebooks/screen-stocks-by-investment-thesis) -- [Screen Stocks by Margin Sensitivity](/notebooks/screen-stocks-by-margin-sensitivity) -- [Multi-Stage Lead Qualification](/notebooks/multi-stage-lead-qualification) +- [LLM Screening at Scale](/docs/notebooks/llm-powered-screening-at-scale) +- [Screen Job Postings by Criteria](/docs/notebooks/screen-job-postings-by-criteria) +- [Screen Stocks by Investment Thesis](/docs/notebooks/screen-stocks-by-investment-thesis) +- [Screen Stocks by Margin Sensitivity](/docs/notebooks/screen-stocks-by-margin-sensitivity) +- [Multi-Stage Lead Qualification](/docs/notebooks/multi-stage-lead-qualification) ### Blog posts - [Thematic Stock Screen](https://futuresearch.ai/thematic-stock-screening/) diff --git a/docs/skills-vs-mcp.mdx b/docs/skills-vs-mcp.mdx index ba0709c3..e5763c20 100644 --- a/docs/skills-vs-mcp.mdx +++ b/docs/skills-vs-mcp.mdx @@ -89,6 +89,6 @@ MCP provides **direct tool calls** without code generation. ## Next Steps -- [Installation Guide](/docs/installation) - Full setup instructions +- [Installation Guide](/docs) - Full setup instructions - [API Reference](/docs/reference/RESEARCH) - Detailed function documentation - [Case Studies](/docs/notebooks/dedupe-crm-company-records) - Real-world examples