docs/src/oss/python/integrations/document_loaders/crw.mdx at e32336c2edda6ef72ac25345c8775557fd96704c · langchain-ai/docs

title	CRW integration
description	Integrate with the CRW document loader using LangChain Python.

CRW is an open-source, Firecrawl-compatible web scraper written in Rust. It ships as a single binary, runs with zero config in subprocess mode, and returns clean markdown, HTML, or JSON. Works self-hosted or via the fastcrw.com cloud API.

Overview

Integration details

Class	Package	Local	Serializable
`CrwLoader`	`langchain-crw`	✅	❌

Loader features

Source	Document Lazy Loading	Native Async Support
`CrwLoader`	✅	❌

Setup

pip install langchain-crw

No server required — the SDK spawns the crw binary as a local subprocess on first use. For cloud mode, get an API key at fastcrw.com.

Usage

Scrape a single page

from langchain_crw import CrwLoader

loader = CrwLoader(url="https://example.com", mode="scrape")
docs = loader.load()

print(docs[0].page_content)   # clean markdown
print(docs[0].metadata)       # {'title': ..., 'sourceURL': ..., 'statusCode': 200}

Cloud mode (fastcrw.com)

loader = CrwLoader(
    url="https://example.com",
    mode="scrape",
    api_url="https://fastcrw.com/api",
    api_key="YOUR_API_KEY",  # or CRW_API_KEY env var
)
docs = loader.load()

Self-hosted server

loader = CrwLoader(url="https://example.com", api_url="http://localhost:3000")
docs = loader.load()

Modes

scrape: Scrape a single URL and return markdown.
crawl: Crawl a URL and all accessible sub-pages.
map: Discover URLs on a site via sitemap and link extraction.
search: Web search plus content scraping (cloud only).

Crawl

loader = CrwLoader(
    url="https://docs.example.com",
    mode="crawl",
    params={"max_depth": 3, "max_pages": 50},
)
docs = loader.load()

Map

loader = CrwLoader(url="https://example.com", mode="map")
urls = [doc.page_content for doc in loader.load()]

JS rendering

loader = CrwLoader(
    url="https://spa-app.example.com",
    mode="scrape",
    params={
        "render_js": True,
        "wait_for": 3000,
        "css_selector": "article.main-content",
    },
)
docs = loader.load()

API reference

For full configuration options, see the langchain-crw README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overview

Integration details

Loader features

Setup

Usage

Scrape a single page

Cloud mode (fastcrw.com)

Self-hosted server

Modes

Crawl

Map

JS rendering

API reference

FilesExpand file tree

crw.mdx

Latest commit

History

crw.mdx

File metadata and controls

Overview

Integration details

Loader features

Setup

Usage

Scrape a single page

Cloud mode (fastcrw.com)

Self-hosted server

Modes

Crawl

Map

JS rendering

API reference