Skip to content
CoderHXL edited this page May 6, 2023 · 5 revisions

x-crawl · npm GitHub license

English | 简体中文

x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files.

If you also like x-crawl, you can give x-crawl repository a star to support it, thank you for your support!

Features

  • 🔥 Asynchronous Synchronous - Just change the mode property to toggle asynchronous or synchronous crawling mode.
  • ⚙️ Multiple purposes - It can crawl pages, crawl interfaces, crawl files and poll crawls to meet the needs of various scenarios.
  • 🖋️ Flexible writing style - The same crawling API can be adapted to multiple configurations, and each configuration method is very unique.
  • ⏱️ Interval Crawling - No interval, fixed interval and random interval to generate or avoid high concurrent crawling.
  • 🔄 Failed Retry - Avoid crawling failure due to short-term problems, and customize the number of retries.
  • ➡️ Proxy Rotation - Auto-rotate proxies with failure retry, custom error times and HTTP status codes.
  • 👀 Device Fingerprinting - Zero configuration or custom configuration, avoid fingerprinting to identify and track us from different locations.
  • 🚀 Priority Queue - According to the priority of a single crawling target, it can be crawled ahead of other targets.
  • ☁️ Crawl SPA - Crawl SPA (Single Page Application) to generate pre-rendered content (aka "SSR" (Server Side Rendering)).
  • ⚒️ Control Page - You can submit form, keyboard input, event operation, generate screenshots of the page, etc.
  • 🧾 Capture Record - Capture and record crawling, and use colored strings to remind in the terminal.
  • 🦾 TypeScript - Own types, implement complete types through generics.
Clone this wiki locally