You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-30
Original file line number
Diff line number
Diff line change
@@ -2,30 +2,24 @@
2
2
3
3
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
4
4
5
-
x-crawl is a flexible nodejs crawler library. You can crawl pages and control operations such as pages, batch network requests, and batch downloads of file resources. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
5
+
x-crawl is a flexible nodejs crawler library. It can crawl pages, control pages, batch network requests, batch download file resources, polling and crawling, etc. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
6
6
7
7
> If you feel good, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a Star to support it, your Star will be the motivation for my update.
8
8
9
9
## Features
10
10
11
11
- Support asynchronous/synchronous way to crawl data.
12
-
- Flexible writing, support a variety of ways to write request configuration and obtain crawl results.
13
-
- Flexible crawling interval, up to you to use/avoid high concurrent crawling.
14
-
-With simple configuration, operations such as crawling pages, batch network requests, and batch download of file resources can be performed.
15
-
-Possess polling function to crawl data regularly.
16
-
-The built-in puppeteer crawls the page, and uses the jsdom library to analyze the content of the page, and also supports self-analysis.
17
-
- Capture the success and failure of the climb and highlight the reminder.
12
+
- Flexible writing, supporting multiple ways to write request configuration and obtain crawling results.
13
+
- Flexible crawling interval, no interval/fixed interval/random interval, it is up to you to use/avoid high concurrent crawling.
14
+
-Simple configuration can crawl pages, batch network requests, batch download file resources, polling and crawling, etc.
15
+
-Crawl SPA (single-page application) to generate pre-rendered content (ie "SSR" (server-side rendering)), and use jsdom library to parse the content, and also supports self-parsing.
16
+
-Form submissions, keystrokes, event actions, screenshots of generated pages, etc.
17
+
- Capture and record the success and failure of crawling, and highlight the reminders.
18
18
- Written in TypeScript, has types, provides generics.
19
19
20
-
## Relationship with puppeteer
20
+
## Relationship with puppeteer
21
21
22
-
The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/puppeteer) library to help us crawl pages.
23
-
24
-
The return value of the crawlPage API will be able to do the following:
25
-
26
-
- Generate screenshots and PDFs of pages.
27
-
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
28
-
- Automate form submission, UI testing, keyboard input, etc.
22
+
The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/puppeteer) library to help us crawl pages and expose Brower instances and Page instances, making it more flexible.
29
23
30
24
# Table of Contents
31
25
@@ -91,7 +85,7 @@ npm install x-crawl
91
85
92
86
## Example
93
87
94
-
Regular crawling: Get the recommended pictures of the youtube homepage every other day as an example:
88
+
Timing capture: Take the automatic capture of the cover image of Airbnb Plus listings every day as an example:
95
89
96
90
```js
97
91
// 1.Import module ES/CJS
@@ -105,23 +99,18 @@ const myXCrawl = xCrawl({
105
99
106
100
// 3.Set the crawling task
107
101
// Call the startPolling API to start the polling function, and the callback function will be called every other day
Copy file name to clipboardExpand all lines: publish/README.md
+19-30
Original file line number
Diff line number
Diff line change
@@ -2,30 +2,24 @@
2
2
3
3
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
4
4
5
-
x-crawl is a flexible nodejs crawler library. You can crawl pages and control operations such as pages, batch network requests, and batch downloads of file resources. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
5
+
x-crawl is a flexible nodejs crawler library. It can crawl pages, control pages, batch network requests, batch download file resources, polling and crawling, etc. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
6
6
7
7
> If you feel good, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a Star to support it, your Star will be the motivation for my update.
8
8
9
9
## Features
10
10
11
11
- Support asynchronous/synchronous way to crawl data.
12
-
- Flexible writing, support a variety of ways to write request configuration and obtain crawl results.
13
-
- Flexible crawling interval, up to you to use/avoid high concurrent crawling.
14
-
-With simple configuration, operations such as crawling pages, batch network requests, and batch download of file resources can be performed.
15
-
-Possess polling function to crawl data regularly.
16
-
-The built-in puppeteer crawls the page, and uses the jsdom library to analyze the content of the page, and also supports self-analysis.
17
-
- Capture the success and failure of the climb and highlight the reminder.
12
+
- Flexible writing, supporting multiple ways to write request configuration and obtain crawling results.
13
+
- Flexible crawling interval, no interval/fixed interval/random interval, it is up to you to use/avoid high concurrent crawling.
14
+
-Simple configuration can crawl pages, batch network requests, batch download file resources, polling and crawling, etc.
15
+
-Crawl SPA (single-page application) to generate pre-rendered content (ie "SSR" (server-side rendering)), and use jsdom library to parse the content, and also supports self-parsing.
16
+
-Form submissions, keystrokes, event actions, screenshots of generated pages, etc.
17
+
- Capture and record the success and failure of crawling, and highlight the reminders.
18
18
- Written in TypeScript, has types, provides generics.
19
19
20
-
## Relationship with puppeteer
20
+
## Relationship with puppeteer
21
21
22
-
The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/puppeteer) library to help us crawl pages.
23
-
24
-
The return value of the crawlPage API will be able to do the following:
25
-
26
-
- Generate screenshots and PDFs of pages.
27
-
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
28
-
- Automate form submission, UI testing, keyboard input, etc.
22
+
The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/puppeteer) library to help us crawl pages and expose Brower instances and Page instances, making it more flexible.
29
23
30
24
# Table of Contents
31
25
@@ -91,7 +85,7 @@ npm install x-crawl
91
85
92
86
## Example
93
87
94
-
Regular crawling: Get the recommended pictures of the youtube homepage every other day as an example:
88
+
Timing capture: Take the automatic capture of the cover image of Airbnb Plus listings every day as an example:
95
89
96
90
```js
97
91
// 1.Import module ES/CJS
@@ -105,23 +99,18 @@ const myXCrawl = xCrawl({
105
99
106
100
// 3.Set the crawling task
107
101
// Call the startPolling API to start the polling function, and the callback function will be called every other day
0 commit comments