Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -796,3 +796,7 @@ packages:
js: "@oracle/langchain-oracledb"
downloads: 64000
downloads_updated_at: "2026-04-20T00:14:41.475493+00:00"
- name: langchain-crw
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partner_pkg_table.py derives a display title from name when name_title is omitted, which would render langchain-crw as “Crw” (losing the intended all-caps acronym). Add name_title: CRW to preserve branding/capitalization in generated tables.

Suggested change
- name: langchain-crw
- name: langchain-crw
name_title: CRW

Copilot uses AI. Check for mistakes.
repo: us/langchain-crw
downloads: 0
downloads_updated_at: "2026-04-22T00:00:00+00:00"
110 changes: 110 additions & 0 deletions src/oss/python/integrations/document_loaders/crw.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: "CRW integration"
description: "Integrate with the CRW document loader using LangChain Python."
---

[CRW](https://github.com/us/crw) is an open-source, Firecrawl-compatible web
scraper written in Rust. It ships as a single binary, runs with zero config in
subprocess mode, and returns clean markdown, HTML, or JSON. Works self-hosted
or via the [fastcrw.com](https://fastcrw.com) cloud API.
Comment on lines +6 to +9
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new document loader page doesn’t appear to be linked from the document loader landing page (src/oss/python/integrations/document_loaders/index.mdx) in either the “Webpages” table or the “All document loaders” card grid, so it will be hard to discover via browsing. Add CRW to the appropriate section(s) there (verified there are no existing .../document_loaders/crw links elsewhere).

Copilot uses AI. Check for mistakes.

## Overview

### Integration details

| Class | Package | Local | Serializable |
| :--- | :--- | :---: | :---: |
| `CrwLoader` | `langchain-crw` | ✅ | ❌ |

Comment on lines +15 to +18
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration details table is written with a double leading pipe (|| ...), which breaks Markdown table rendering in Mintlify. Use a single leading pipe for the header/alignment/data rows (matching other loader pages like firecrawl.mdx).

Copilot uses AI. Check for mistakes.
### Loader features

| Source | Document Lazy Loading | Native Async Support |
| :---: | :---: | :---: |
| `CrwLoader` | ✅ | ❌ |
Comment on lines +21 to +23
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loader features table also starts rows with ||, which will prevent it from rendering as a proper table. Switch to the standard Markdown table syntax with a single leading | per row.

Copilot uses AI. Check for mistakes.

## Setup

```bash
pip install langchain-crw
```

No server required — the SDK spawns the `crw` binary as a local subprocess on
first use. For cloud mode, get an API key at [fastcrw.com](https://fastcrw.com).

## Usage

### Scrape a single page

```python
from langchain_crw import CrwLoader

loader = CrwLoader(url="https://example.com", mode="scrape")
docs = loader.load()

print(docs[0].page_content) # clean markdown
print(docs[0].metadata) # {'title': ..., 'sourceURL': ..., 'statusCode': 200}
```

### Cloud mode (fastcrw.com)

```python
loader = CrwLoader(
url="https://example.com",
mode="scrape",
api_url="https://fastcrw.com/api",
api_key="YOUR_API_KEY", # or CRW_API_KEY env var
)
docs = loader.load()
```

### Self-hosted server

```python
loader = CrwLoader(url="https://example.com", api_url="http://localhost:3000")
docs = loader.load()
```

## Modes

- `scrape`: Scrape a single URL and return markdown.
- `crawl`: Crawl a URL and all accessible sub-pages.
- `map`: Discover URLs on a site via sitemap and link extraction.
- `search`: Web search plus content scraping (cloud only).

### Crawl

```python
loader = CrwLoader(
url="https://docs.example.com",
mode="crawl",
params={"max_depth": 3, "max_pages": 50},
)
docs = loader.load()
```

### Map

```python
loader = CrwLoader(url="https://example.com", mode="map")
urls = [doc.page_content for doc in loader.load()]
```

### JS rendering

```python
loader = CrwLoader(
url="https://spa-app.example.com",
mode="scrape",
params={
"render_js": True,
"wait_for": 3000,
"css_selector": "article.main-content",
},
)
docs = loader.load()
```

## API reference

For full configuration options, see the
[`langchain-crw` README](https://github.com/us/langchain-crw).
31 changes: 31 additions & 0 deletions src/oss/python/integrations/providers/crw.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: "CRW integrations"
description: "Integrate with CRW using LangChain Python."
---

>[CRW](https://github.com/us/crw) is an open-source, Firecrawl-compatible web
> scraper written in Rust. It ships as a single ~6 MB binary, runs self-hosted
> or via the [fastcrw.com](https://fastcrw.com) cloud API, and returns clean
> LLM-ready markdown, HTML, or structured JSON.

## Installation and setup

Install the partner integration package:

<CodeGroup>
```bash pip
pip install langchain-crw
```

```bash uv
uv add langchain-crw
```
</CodeGroup>

## Document loader

See a [usage example](/oss/integrations/document_loaders/crw).
Comment on lines +25 to +27
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new provider page doesn’t appear to be referenced from the “All providers” index (src/oss/python/integrations/providers/all_providers.mdx), so it won’t be discoverable from the main providers browsing flow (search confirmed no /oss/integrations/providers/crw links). Add a corresponding Card entry to all_providers.mdx.

Copilot uses AI. Check for mistakes.

```python
from langchain_crw import CrwLoader
```
Loading