You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+37-11
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
4
4
5
-
x-crawl is a flexible nodejs crawler library. Used to crawl pages, batch network requests, and batch download file resources. Crawl data in asynchronous or synchronous mode, 3 ways to get results, and 5 ways to write requestConfig. Runs on nodejs, friendly to JS/TS developers.
5
+
x-crawl is a flexible nodejs crawler library. You can crawl pages and control operations such as pages, batch network requests, and batch downloads of file resources. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
6
6
7
7
If you feel good, you can support [x-crawl repository](https://github.com/coder-hxl/x-crawl) with a Star.
8
8
@@ -11,8 +11,8 @@ If you feel good, you can support [x-crawl repository](https://github.com/coder-
11
11
- Cules data for asynchronous/synchronous ways.
12
12
- In three ways to obtain the results of the three ways of supporting Promise, Callback, and Promise + Callback.
13
13
- RquestConfig has 5 ways of writing.
14
-
-The anthropomorphic request interval time.
15
-
-In a simple configuration, you can capture pages, JSON, file resources, and so on.
14
+
-Flexible request interval.
15
+
-Operations such as crawling pages, batch network requests, and batch downloading of file resources can be performed with simple configuration.
16
16
- The rotation function, crawl regularly.
17
17
- The built -in Puppeteer crawl the page and uses the JSDOM library to analyze the page, or it can also be parsed by itself.
18
18
- Chopening with TypeScript, possessing type prompts, and providing generic types.
Refer to [jsdom](https://github.com/jsdom/jsdom) for specific usage.
217
+
It is an instance object of [JSDOM](https://github.com/jsdom/jsdom), please refer to [jsdom](https://github.com/jsdom/jsdom) for specific usage.
218
+
219
+
**Note:** The jsdom instance only parses the content of [page instance](#page-instance), if you use page instance for event operation, you may need to parse the latest by yourself For details, please refer to the self-parsing page of [page instance](#page-instance).
218
220
219
221
#### browser instance
220
222
221
-
The browser instance is a headless browser without a UI shell. What he does is to bring **all modern network platform functions** provided by the browser rendering engine to the code.
223
+
It is an instance object of [Browser](https://pptr.dev/api/puppeteer.browser). For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
222
224
223
-
**Purpose of calling close:**The browser instance will always be running internally, causing the file not to be terminated. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. When you modify the properties of a browser instance, it will affect the browser instance inside the crawlPage API of the crawler instance, the page instance that returns the result, and the browser instance, because the browser instance is shared within the crawlPage API of the same crawler instance.
225
+
The browser instance is a headless browser without a UI shell. What he does is to bring **all modern network platform functions** provided by the browser rendering engine to the code.
224
226
225
-
Refer to [browser](https://pptr.dev/api/puppeteer.browser) for specific usage.
227
+
**Note:** An event loop will always be generated inside the browser instance, causing the file not to be terminated. If you want to stop, you can execute browser.close() to close it. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. Because when you modify the properties of the browser instance, it will affect the browser instance inside the crawlPage API of the crawler instance, the page instance that returns the result, and the browser instance, because the browser instance is shared within the crawlPage API of the same crawler instance.
226
228
227
229
#### page instance
228
230
231
+
It is an instance object of [Page](https://pptr.dev/api/puppeteer.page). The instance can also perform interactive operations such as events. For specific usage, please refer to [page](https://pptr.dev /api/puppeteer. page).
Copy file name to clipboardExpand all lines: publish/README.md
+37-11
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
4
4
5
-
x-crawl is a flexible nodejs crawler library. Used to crawl pages, batch network requests, and batch download file resources. Crawl data in asynchronous or synchronous mode, 3 ways to get results, and 5 ways to write requestConfig. Runs on nodejs, friendly to JS/TS developers.
5
+
x-crawl is a flexible nodejs crawler library. You can crawl pages and control operations such as pages, batch network requests, and batch downloads of file resources. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
6
6
7
7
If you feel good, you can support [x-crawl repository](https://github.com/coder-hxl/x-crawl) with a Star.
8
8
@@ -11,8 +11,8 @@ If you feel good, you can support [x-crawl repository](https://github.com/coder-
11
11
- Cules data for asynchronous/synchronous ways.
12
12
- In three ways to obtain the results of the three ways of supporting Promise, Callback, and Promise + Callback.
13
13
- RquestConfig has 5 ways of writing.
14
-
-The anthropomorphic request interval time.
15
-
-In a simple configuration, you can capture pages, JSON, file resources, and so on.
14
+
-Flexible request interval.
15
+
-Operations such as crawling pages, batch network requests, and batch downloading of file resources can be performed with simple configuration.
16
16
- The rotation function, crawl regularly.
17
17
- The built -in Puppeteer crawl the page and uses the JSDOM library to analyze the page, or it can also be parsed by itself.
18
18
- Chopening with TypeScript, possessing type prompts, and providing generic types.
Refer to [jsdom](https://github.com/jsdom/jsdom) for specific usage.
217
+
It is an instance object of [JSDOM](https://github.com/jsdom/jsdom), please refer to [jsdom](https://github.com/jsdom/jsdom) for specific usage.
218
+
219
+
**Note:** The jsdom instance only parses the content of [page instance](#page-instance), if you use page instance for event operation, you may need to parse the latest by yourself For details, please refer to the self-parsing page of [page instance](#page-instance).
218
220
219
221
#### browser instance
220
222
221
-
The browser instance is a headless browser without a UI shell. What he does is to bring **all modern network platform functions** provided by the browser rendering engine to the code.
223
+
It is an instance object of [Browser](https://pptr.dev/api/puppeteer.browser). For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
222
224
223
-
**Purpose of calling close:**The browser instance will always be running internally, causing the file not to be terminated. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. When you modify the properties of a browser instance, it will affect the browser instance inside the crawlPage API of the crawler instance, the page instance that returns the result, and the browser instance, because the browser instance is shared within the crawlPage API of the same crawler instance.
225
+
The browser instance is a headless browser without a UI shell. What he does is to bring **all modern network platform functions** provided by the browser rendering engine to the code.
224
226
225
-
Refer to [browser](https://pptr.dev/api/puppeteer.browser) for specific usage.
227
+
**Note:** An event loop will always be generated inside the browser instance, causing the file not to be terminated. If you want to stop, you can execute browser.close() to close it. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. Because when you modify the properties of the browser instance, it will affect the browser instance inside the crawlPage API of the crawler instance, the page instance that returns the result, and the browser instance, because the browser instance is shared within the crawlPage API of the same crawler instance.
226
228
227
229
#### page instance
228
230
231
+
It is an instance object of [Page](https://pptr.dev/api/puppeteer.page). The instance can also perform interactive operations such as events. For specific usage, please refer to [page](https://pptr.dev /api/puppeteer. page).
0 commit comments