Unified web page fetcher for Rust — HTTP, headless browser (Chrome / Chromium), and Cloudflare bypass (FlareSolverr), with optional Readability denoising.
- Three fetch channels — choose per request or let the fetcher auto-degrade gracefully:
HTTP— plainreqwestGET with custom UABrowser— calls a local headless Chrome / Chromium (--headless=new --dump-dom) to execute JavaScript before reading HTML.CloudflareBypass— routes through FlareSolverr to solve Cloudflare challenges
- Readability denoising — strips ads / nav / boilerplate from HTML via
dom_smoothie, returning cleanDenoisedArticle { title, text_content, content, … } - SSRF protection — blocks fetches to private / loopback / link-local address ranges
- Graceful fallback — if
Browsermode is requested but no Chrome is available, falls back to HTTP with aWARNlog
use tokimo_web_fetch::{WebFetcher, FetchMode, FetchOptions, Denoise};
let fetcher = WebFetcher::builder().build();
let response = fetcher
.fetch_with(
"https://example.com",
FetchOptions {
mode: FetchMode::Http,
denoise: Denoise::Readability,
..Default::default()
},
)
.await?;
println!("{}", response.denoised.unwrap().text_content);MIT