docs: add CRW integration (provider + document loader pages)#3681
docs: add CRW integration (provider + document loader pages)#3681Recep S (us) wants to merge 1 commit intolangchain-ai:mainfrom
Conversation
|
Thanks for opening a docs PR, Recep S (@us)! When it's ready for review, please add the relevant reviewers:
|
There was a problem hiding this comment.
Pull request overview
Adds LangChain documentation pages for the langchain-crw partner integration (provider + Python document loader) and registers the package in the repo’s package registry so it can participate in generated integrations tables.
Changes:
- Add a CRW provider integration page under Python providers.
- Add a CRW document loader guide page with usage examples and supported modes.
- Register
langchain-crwinpackages.yml.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/oss/python/integrations/providers/crw.mdx |
New provider page that introduces CRW and links to the loader guide. |
src/oss/python/integrations/document_loaders/crw.mdx |
New loader usage guide including setup, examples, and supported modes. |
packages.yml |
Adds langchain-crw to the package registry for tooling/docs generation. |
| [CRW](https://github.com/us/crw) is an open-source, Firecrawl-compatible web | ||
| scraper written in Rust. It ships as a single binary, runs with zero config in | ||
| subprocess mode, and returns clean markdown, HTML, or JSON. Works self-hosted | ||
| or via the [fastcrw.com](https://fastcrw.com) cloud API. |
There was a problem hiding this comment.
This new document loader page doesn’t appear to be linked from the document loader landing page (src/oss/python/integrations/document_loaders/index.mdx) in either the “Webpages” table or the “All document loaders” card grid, so it will be hard to discover via browsing. Add CRW to the appropriate section(s) there (verified there are no existing .../document_loaders/crw links elsewhere).
| ## Document loader | ||
|
|
||
| See a [usage example](/oss/integrations/document_loaders/crw). |
There was a problem hiding this comment.
This new provider page doesn’t appear to be referenced from the “All providers” index (src/oss/python/integrations/providers/all_providers.mdx), so it won’t be discoverable from the main providers browsing flow (search confirmed no /oss/integrations/providers/crw links). Add a corresponding Card entry to all_providers.mdx.
| js: "@oracle/langchain-oracledb" | ||
| downloads: 64000 | ||
| downloads_updated_at: "2026-04-20T00:14:41.475493+00:00" | ||
| - name: langchain-crw |
There was a problem hiding this comment.
partner_pkg_table.py derives a display title from name when name_title is omitted, which would render langchain-crw as “Crw” (losing the intended all-caps acronym). Add name_title: CRW to preserve branding/capitalization in generated tables.
| - name: langchain-crw | |
| - name: langchain-crw | |
| name_title: CRW |
| | Class | Package | Local | Serializable | | ||
| | :--- | :--- | :---: | :---: | | ||
| | `CrwLoader` | `langchain-crw` | ✅ | ❌ | | ||
|
|
There was a problem hiding this comment.
The integration details table is written with a double leading pipe (|| ...), which breaks Markdown table rendering in Mintlify. Use a single leading pipe for the header/alignment/data rows (matching other loader pages like firecrawl.mdx).
| | Source | Document Lazy Loading | Native Async Support | | ||
| | :---: | :---: | :---: | | ||
| | `CrwLoader` | ✅ | ❌ | |
There was a problem hiding this comment.
The loader features table also starts rows with ||, which will prevent it from rendering as a proper table. Switch to the standard Markdown table syntax with a single leading | per row.
Summary
Adds documentation for the
langchain-crwpartner integration, per the guidance in langchain-ai/langchain#36273.Changes
packages.yml: new entry forlangchain-crw(follows the same pattern aslangchain-scrapegraph/langchain-apify)src/oss/python/integrations/providers/crw.mdx: provider page (modeled after the FireCrawl page)src/oss/python/integrations/document_loaders/crw.mdx: document loader usage page (modeled after the FireCrawl page)Context
CRW is an open-source, Firecrawl-compatible web scraper written in Rust. The
CrwLoaderinlangchain-crwsupportsscrape,crawl,map, andsearchmodes viaBaseLoaderwith nativelazy_load().Related: langchain-ai/langchain#36273