| title | CRW integration |
|---|---|
| description | Integrate with the CRW document loader using LangChain Python. |
CRW is an open-source, Firecrawl-compatible web scraper written in Rust. It ships as a single binary, runs with zero config in subprocess mode, and returns clean markdown, HTML, or JSON. Works self-hosted or via the fastcrw.com cloud API.
| Class | Package | Local | Serializable |
|---|---|---|---|
CrwLoader |
langchain-crw |
✅ | ❌ |
| Source | Document Lazy Loading | Native Async Support |
|---|---|---|
CrwLoader |
✅ | ❌ |
pip install langchain-crwNo server required — the SDK spawns the crw binary as a local subprocess on
first use. For cloud mode, get an API key at fastcrw.com.
from langchain_crw import CrwLoader
loader = CrwLoader(url="https://example.com", mode="scrape")
docs = loader.load()
print(docs[0].page_content) # clean markdown
print(docs[0].metadata) # {'title': ..., 'sourceURL': ..., 'statusCode': 200}loader = CrwLoader(
url="https://example.com",
mode="scrape",
api_url="https://fastcrw.com/api",
api_key="YOUR_API_KEY", # or CRW_API_KEY env var
)
docs = loader.load()loader = CrwLoader(url="https://example.com", api_url="http://localhost:3000")
docs = loader.load()scrape: Scrape a single URL and return markdown.crawl: Crawl a URL and all accessible sub-pages.map: Discover URLs on a site via sitemap and link extraction.search: Web search plus content scraping (cloud only).
loader = CrwLoader(
url="https://docs.example.com",
mode="crawl",
params={"max_depth": 3, "max_pages": 50},
)
docs = loader.load()loader = CrwLoader(url="https://example.com", mode="map")
urls = [doc.page_content for doc in loader.load()]loader = CrwLoader(
url="https://spa-app.example.com",
mode="scrape",
params={
"render_js": True,
"wait_for": 3000,
"css_selector": "article.main-content",
},
)
docs = loader.load()For full configuration options, see the
langchain-crw README.