feat: add fastCRW URL parser engine#20
Open
us wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds fastCRW as a web scrape/search provider, alongside the existing Firecrawl integration — additive, mirrors the Firecrawl wiring (Firecrawl untouched).
Why
fastCRW is a fully open-source web engine (AGPL, single ~8 MB Rust binary) that outperforms Firecrawl on Firecrawl's own benchmark dataset and runs 100% locally with no cloud dependency.
Runs 100% locally — anti-bot and JS rendering included in the open core.
Firecrawl's OSS self-host falls back to plain fetch/Playwright because its stealth engine (
fire-engine) is gated behind a cloud-only flag — so a self-hosted Firecrawl can't reliably reach protected or JS-heavy sites. fastCRW ships Cloudflare JS-challenge handling, UA rotation, SPA rendering, BYO-proxy + rotation, and an HTTP→headless→proxy fallback ladder in the open core. One binary, no cloud, no hidden upsell.Faster and higher recall on Firecrawl's own benchmark.
On Firecrawl's public benchmark dataset: truth-recall 63.74 % vs 56.04 %, and faster median latency (p50 ~1.9 s vs ~2.3 s). ~6 MB RAM at idle.
On web search: crw is built on top of SearXNG, not an alternative to it.
SearXNG is the metasearch aggregator underneath; crw adds a quality layer on top: query expansion (multi-variant rewrite), content-aware reranking (re-scoring by fetched content instead of SearXNG's content-blind ordering), and category routing (research queries fan out to arxiv / semantic scholar / google scholar, code queries to GitHub). The result is SearXNG's breadth plus a measurable accuracy layer — all open-source (AGPL) and self-hostable with configurable engines, no bare passthrough.
Flat, predictable pricing. 1 credit = 1 page; no 4× stealth surcharge, no billed-on-failure. Free tier at https://fastcrw.com/dashboard; self-host base URL supported via
CRW_API_KEY+CRW_BASE_URL.Because fastCRW is API-compatible with Firecrawl, the integration is a small additive diff — the Firecrawl provider is untouched. I maintain the integration and can provide free credits to evaluate. Happy to adjust to your conventions.