Skip to content

docs: add CRW integration (provider + document loader pages)#3681

Open
Recep S (us) wants to merge 1 commit intolangchain-ai:mainfrom
us:add-crw-integration
Open

docs: add CRW integration (provider + document loader pages)#3681
Recep S (us) wants to merge 1 commit intolangchain-ai:mainfrom
us:add-crw-integration

Conversation

@us
Copy link
Copy Markdown

Summary

Adds documentation for the langchain-crw partner integration, per the guidance in langchain-ai/langchain#36273.

Changes

  • packages.yml: new entry for langchain-crw (follows the same pattern as langchain-scrapegraph / langchain-apify)
  • src/oss/python/integrations/providers/crw.mdx: provider page (modeled after the FireCrawl page)
  • src/oss/python/integrations/document_loaders/crw.mdx: document loader usage page (modeled after the FireCrawl page)

Context

CRW is an open-source, Firecrawl-compatible web scraper written in Rust. The CrwLoader in langchain-crw supports scrape, crawl, map, and search modes via BaseLoader with native lazy_load().

Related: langchain-ai/langchain#36273

Copilot AI review requested due to automatic review settings April 22, 2026 20:43
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening a docs PR, Recep S (@us)! When it's ready for review, please add the relevant reviewers:

  • @mdrxy (Python integrations)

@github-actions github-actions Bot added langchain For docs changes to LangChain oss python For content related to the Python version of LangChain projects labels Apr 22, 2026
@github-actions github-actions Bot added the external User is not a member of langchain-ai label Apr 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds LangChain documentation pages for the langchain-crw partner integration (provider + Python document loader) and registers the package in the repo’s package registry so it can participate in generated integrations tables.

Changes:

  • Add a CRW provider integration page under Python providers.
  • Add a CRW document loader guide page with usage examples and supported modes.
  • Register langchain-crw in packages.yml.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/oss/python/integrations/providers/crw.mdx New provider page that introduces CRW and links to the loader guide.
src/oss/python/integrations/document_loaders/crw.mdx New loader usage guide including setup, examples, and supported modes.
packages.yml Adds langchain-crw to the package registry for tooling/docs generation.

Comment on lines +6 to +9
[CRW](https://github.com/us/crw) is an open-source, Firecrawl-compatible web
scraper written in Rust. It ships as a single binary, runs with zero config in
subprocess mode, and returns clean markdown, HTML, or JSON. Works self-hosted
or via the [fastcrw.com](https://fastcrw.com) cloud API.
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new document loader page doesn’t appear to be linked from the document loader landing page (src/oss/python/integrations/document_loaders/index.mdx) in either the “Webpages” table or the “All document loaders” card grid, so it will be hard to discover via browsing. Add CRW to the appropriate section(s) there (verified there are no existing .../document_loaders/crw links elsewhere).

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +27
## Document loader

See a [usage example](/oss/integrations/document_loaders/crw).
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new provider page doesn’t appear to be referenced from the “All providers” index (src/oss/python/integrations/providers/all_providers.mdx), so it won’t be discoverable from the main providers browsing flow (search confirmed no /oss/integrations/providers/crw links). Add a corresponding Card entry to all_providers.mdx.

Copilot uses AI. Check for mistakes.
Comment thread packages.yml
js: "@oracle/langchain-oracledb"
downloads: 64000
downloads_updated_at: "2026-04-20T00:14:41.475493+00:00"
- name: langchain-crw
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partner_pkg_table.py derives a display title from name when name_title is omitted, which would render langchain-crw as “Crw” (losing the intended all-caps acronym). Add name_title: CRW to preserve branding/capitalization in generated tables.

Suggested change
- name: langchain-crw
- name: langchain-crw
name_title: CRW

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +18
| Class | Package | Local | Serializable |
| :--- | :--- | :---: | :---: |
| `CrwLoader` | `langchain-crw` | ✅ | ❌ |

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration details table is written with a double leading pipe (|| ...), which breaks Markdown table rendering in Mintlify. Use a single leading pipe for the header/alignment/data rows (matching other loader pages like firecrawl.mdx).

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +23
| Source | Document Lazy Loading | Native Async Support |
| :---: | :---: | :---: |
| `CrwLoader` | ✅ | ❌ |
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loader features table also starts rows with ||, which will prevent it from rendering as a proper table. Switch to the standard Markdown table syntax with a single leading | per row.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external User is not a member of langchain-ai langchain For docs changes to LangChain oss python For content related to the Python version of LangChain projects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants