You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-30
Original file line number
Diff line number
Diff line change
@@ -2,24 +2,24 @@
2
2
3
3
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
4
4
5
-
x-crawl is a flexible nodejs crawler library. It can crawl pages, control pages, batch network requests, batch download file resources, polling and crawling, etc. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
5
+
x-crawl is a flexible nodejs crawler library. It can crawl pages in batches, network requests in batches, download file resources in batches, polling and crawling, etc. Supports asynchronous/synchronous mode crawling. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
6
6
7
7
> If you feel good, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a Star to support it, your Star will be the motivation for my update.
8
8
9
9
## Features
10
10
11
-
- Support asynchronous/synchronous way to crawl data.
12
-
-Flexible writing, supporting multiple ways to write request configuration and obtain crawling results.
13
-
- Flexible crawling interval, no interval/fixed interval/random interval, it is up to you to use/avoid high concurrent crawling.
14
-
-Simple configuration can crawl pages, batch network requests, batch download file resources, polling and crawling, etc.
15
-
- Crawl SPA (single-page application) to generate pre-rendered content (ie "SSR" (server-side rendering)), and use jsdom library to parse the content, and also supports self-parsing.
16
-
-Form submissions, keystrokes, event actions, screenshots of generated pages, etc.
17
-
- Capture and record the success and failure of crawling, and highlight the reminders.
18
-
-Written in TypeScript, has types, provides generics.
11
+
-**🔥 Asynchronous/Synchronous** - Support asynchronous/synchronous mode batch crawling.
12
+
-**⚙️ Multiple functions** - Batch crawling of pages, batch network requests, batch download of file resources, polling crawling, etc.
13
+
-**🖋️ Flexible writing style** - Multiple crawling configurations and ways to get crawling results.
14
+
-**⏱️ Interval crawling** - no interval/fixed interval/random interval, you can use/avoid high concurrent crawling.
15
+
-**☁️ Crawl SPA** - Batch crawl SPA (Single Page Application) to generate pre-rendered content (ie "SSR" (Server Side Rendering)).
16
+
-**⚒️ Controlling Pages** - Headless browsers can submit forms, keystrokes, event actions, generate screenshots of pages, etc.
17
+
-**🧾 Capture Record** - Capture and record the crawled results, and highlight the reminders.
18
+
-**🦾TypeScript** - Own types, implement complete types through generics.
19
19
20
20
## Relationship with puppeteer
21
21
22
-
The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/puppeteer) library to help us crawl pages and expose Brower instances and Page instances, making it more flexible.
22
+
The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/puppeteer) library to help us crawl pages and expose Brower instances and Page instances.
23
23
24
24
# Table of Contents
25
25
@@ -31,7 +31,6 @@ The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/p
It is an instance object of [JSDOM](https://github.com/jsdom/jsdom), please refer to [jsdom](https://github.com/jsdom/jsdom) for specific usage.
209
-
210
-
**Note:** The jsdom instance only parses the content of [page instance](#page-instance), if you use page instance for event operation, you may need to parse the latest by yourself For details, please refer to the self-parsing page of [page instance](#page-instance).
211
-
212
204
#### browser instance
213
205
214
206
It is an instance object of [Browser](https://pptr.dev/api/puppeteer.browser). For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
0 commit comments