Skip to content

Commit e32336c

Browse files
committed
docs: add CRW integration (provider + document loader pages)
1 parent 2ff3f46 commit e32336c

3 files changed

Lines changed: 145 additions & 0 deletions

File tree

packages.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -796,3 +796,7 @@ packages:
796796
js: "@oracle/langchain-oracledb"
797797
downloads: 64000
798798
downloads_updated_at: "2026-04-20T00:14:41.475493+00:00"
799+
- name: langchain-crw
800+
repo: us/langchain-crw
801+
downloads: 0
802+
downloads_updated_at: "2026-04-22T00:00:00+00:00"
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
title: "CRW integration"
3+
description: "Integrate with the CRW document loader using LangChain Python."
4+
---
5+
6+
[CRW](https://github.com/us/crw) is an open-source, Firecrawl-compatible web
7+
scraper written in Rust. It ships as a single binary, runs with zero config in
8+
subprocess mode, and returns clean markdown, HTML, or JSON. Works self-hosted
9+
or via the [fastcrw.com](https://fastcrw.com) cloud API.
10+
11+
## Overview
12+
13+
### Integration details
14+
15+
| Class | Package | Local | Serializable |
16+
| :--- | :--- | :---: | :---: |
17+
| `CrwLoader` | `langchain-crw` |||
18+
19+
### Loader features
20+
21+
| Source | Document Lazy Loading | Native Async Support |
22+
| :---: | :---: | :---: |
23+
| `CrwLoader` |||
24+
25+
## Setup
26+
27+
```bash
28+
pip install langchain-crw
29+
```
30+
31+
No server required — the SDK spawns the `crw` binary as a local subprocess on
32+
first use. For cloud mode, get an API key at [fastcrw.com](https://fastcrw.com).
33+
34+
## Usage
35+
36+
### Scrape a single page
37+
38+
```python
39+
from langchain_crw import CrwLoader
40+
41+
loader = CrwLoader(url="https://example.com", mode="scrape")
42+
docs = loader.load()
43+
44+
print(docs[0].page_content) # clean markdown
45+
print(docs[0].metadata) # {'title': ..., 'sourceURL': ..., 'statusCode': 200}
46+
```
47+
48+
### Cloud mode (fastcrw.com)
49+
50+
```python
51+
loader = CrwLoader(
52+
url="https://example.com",
53+
mode="scrape",
54+
api_url="https://fastcrw.com/api",
55+
api_key="YOUR_API_KEY", # or CRW_API_KEY env var
56+
)
57+
docs = loader.load()
58+
```
59+
60+
### Self-hosted server
61+
62+
```python
63+
loader = CrwLoader(url="https://example.com", api_url="http://localhost:3000")
64+
docs = loader.load()
65+
```
66+
67+
## Modes
68+
69+
- `scrape`: Scrape a single URL and return markdown.
70+
- `crawl`: Crawl a URL and all accessible sub-pages.
71+
- `map`: Discover URLs on a site via sitemap and link extraction.
72+
- `search`: Web search plus content scraping (cloud only).
73+
74+
### Crawl
75+
76+
```python
77+
loader = CrwLoader(
78+
url="https://docs.example.com",
79+
mode="crawl",
80+
params={"max_depth": 3, "max_pages": 50},
81+
)
82+
docs = loader.load()
83+
```
84+
85+
### Map
86+
87+
```python
88+
loader = CrwLoader(url="https://example.com", mode="map")
89+
urls = [doc.page_content for doc in loader.load()]
90+
```
91+
92+
### JS rendering
93+
94+
```python
95+
loader = CrwLoader(
96+
url="https://spa-app.example.com",
97+
mode="scrape",
98+
params={
99+
"render_js": True,
100+
"wait_for": 3000,
101+
"css_selector": "article.main-content",
102+
},
103+
)
104+
docs = loader.load()
105+
```
106+
107+
## API reference
108+
109+
For full configuration options, see the
110+
[`langchain-crw` README](https://github.com/us/langchain-crw).
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
title: "CRW integrations"
3+
description: "Integrate with CRW using LangChain Python."
4+
---
5+
6+
>[CRW](https://github.com/us/crw) is an open-source, Firecrawl-compatible web
7+
> scraper written in Rust. It ships as a single ~6 MB binary, runs self-hosted
8+
> or via the [fastcrw.com](https://fastcrw.com) cloud API, and returns clean
9+
> LLM-ready markdown, HTML, or structured JSON.
10+
11+
## Installation and setup
12+
13+
Install the partner integration package:
14+
15+
<CodeGroup>
16+
```bash pip
17+
pip install langchain-crw
18+
```
19+
20+
```bash uv
21+
uv add langchain-crw
22+
```
23+
</CodeGroup>
24+
25+
## Document loader
26+
27+
See a [usage example](/oss/integrations/document_loaders/crw).
28+
29+
```python
30+
from langchain_crw import CrwLoader
31+
```

0 commit comments

Comments
 (0)