Skip to content

Commit 0744a15

Browse files
committed
Update: Docs
1 parent 7bea771 commit 0744a15

File tree

5 files changed

+89
-31
lines changed

5 files changed

+89
-31
lines changed

README.md

+29-9
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
44

5-
X-Crawl is a flexible Nodejs reptile bank. Used to crawl pages, batch network requests, and download file resources in batches. There are 5 kinds of RequestConfig writing, 3 ways to obtain results, and crawl data asynchronous or synchronized mode. Run on Nodejs and be friendly to JS/TS developers.
5+
x-crawl is a flexible nodejs crawler library. Used to crawl pages, batch network requests, and batch download file resources. Crawl data in asynchronous or synchronous mode, 3 ways to get results, and 5 ways to write requestConfig. Runs on nodejs, friendly to JS/TS developers.
66

77
If you feel good, you can support [x-crawl repository](https://github.com/coder-hxl/x-crawl) with a Star.
88

@@ -37,9 +37,9 @@ We can do the following:
3737
+ [Choose crawling mode](#Choose-crawling-mode)
3838
+ [Multiple crawler application instances](#Multiple-crawler-application-instances)
3939
* [Crawl page](#Crawl-page)
40-
+ [jsdom](#jsdom)
41-
+ [browser](#browser)
42-
+ [page](#page)
40+
+ [jsdom instance](#jsdom-instance)
41+
+ [browser instance](#browser-instance)
42+
+ [page instance](#page-instance)
4343
* [Crawl interface](#Crawl-interface)
4444
* [Crawl files](#Crawl-files)
4545
* [Start polling](#Start-polling)
@@ -212,19 +212,39 @@ myXCrawl.crawlPage('https://xxx.com').then(res => {
212212
})
213213
```
214214

215-
#### jsdom
215+
#### jsdom instance
216216

217217
Refer to [jsdom](https://github.com/jsdom/jsdom) for specific usage.
218218

219-
#### browser
219+
#### browser instance
220220

221-
**Purpose of calling close: **browser will keep running, so the file will not be terminated. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. When you modify the properties of the browser object, it will affect the browser inside the crawlPage of the crawler instance, the returned page, and the browser, because the browser is shared within the crawlPage API of the crawler instance.
221+
The browser instance is a headless browser without a UI shell. What he does is to bring **all modern network platform functions** provided by the browser rendering engine to the code.
222+
223+
**Purpose of calling close:** The browser instance will always be running internally, causing the file not to be terminated. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. When you modify the properties of a browser instance, it will affect the browser instance inside the crawlPage API of the crawler instance, the page instance that returns the result, and the browser instance, because the browser instance is shared within the crawlPage API of the same crawler instance.
222224

223225
Refer to [browser](https://pptr.dev/api/puppeteer.browser) for specific usage.
224226

225-
#### page
227+
#### page instance
228+
229+
**Take Screenshot**
230+
231+
```js
232+
import xCrawl from 'x-crawl'
233+
234+
const testXCrawl = xCrawl({ timeout: 10000 })
235+
236+
testXCrawl
237+
.crawlPage('https://xxx.com')
238+
.then(async (res) => {
239+
const { page } = res
240+
241+
await page.screenshot({ path: './upload/page.png' })
242+
243+
console.log('Screen capture is complete')
244+
})
245+
```
226246

227-
The page attribute can be used for interactive operations such as events. For details, refer to [page](https://pptr.dev/api/puppeteer.page).
247+
The page instance can also perform interactive operations such as events. For details, refer to [page](https://pptr.dev/api/puppeteer.page).
228248

229249
### Crawl interface
230250

docs/cn.md

+29-11
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[English](https://github.com/coder-hxl/x-crawl#x-crawl) | 简体中文
44

5-
x-crawl 是一个灵活的 nodejs 爬虫库。用来爬取页面、批量网络请求以及批量下载文件资源。有 5 种 requestConfig 的写法,3 种获取结果的写法,异步或同步模式爬取数据。跑在 nodejs 上,对 JS/TS 开发者友好。
5+
x-crawl 是一个灵活的 nodejs 爬虫库。用来爬取页面、批量网络请求以及批量下载文件资源。异步或同步模式爬取数据,3 种获取结果的写法,有 5 种 requestConfig 的写法。跑在 nodejs 上,对 JS/TS 开发者友好。
66

77
如果感觉不错,可以给 [x-crawl 存储库](https://github.com/coder-hxl/x-crawl) 点个 Star 支持一下。
88

@@ -39,11 +39,9 @@ crawlPage API 内部使用 [puppeteer](https://github.com/puppeteer/puppeteer)
3939
+ [选择爬取模式](#选择爬取模式)
4040
+ [多个爬虫应用实例](#多个爬虫应用实例)
4141
* [爬取页面](#爬取页面)
42-
43-
+ [jsdom](#jsdom)
44-
+ [browser](#browser)
45-
+ [page](#page)
46-
42+
+ [jsdom 实例](#jsdom-实例)
43+
+ [browser 实例](#browser-实例)
44+
+ [page-实例](#page-实例)
4745
* [爬取接口](#爬取接口)
4846
* [爬取文件](#爬取文件)
4947
* [启动轮询](#启动轮询)
@@ -211,19 +209,39 @@ myXCrawl.crawlPage('https://xxx.com').then(res => {
211209
})
212210
```
213211
214-
#### jsdom
212+
#### jsdom 实例
215213
216214
具体使用参考 [jsdom](https://github.com/jsdom/jsdom) 。
217215
218-
#### browser
216+
#### browser 实例
219217
220-
**调用 close 的目的:**browser 会一直保持运行,造成文件不会终止。如果后面还需要用到 [crawlPage](#crawlPage) 或者 [page](#page) 请勿调用。当您修改 browser 对象的属性时,会对该爬虫实例的 crawlPage 内部的 browser 和返回的 page 以及 browser 造成影响,因为 browser 在爬虫实例的 crawlPage API 内是共享的。
218+
browser 实例他是个无头浏览器,并无 UI 外壳,他做的是将浏览器渲染引擎提供的**所有现代网络平台功能**带到代码中。
219+
220+
**调用 close 的目的:** browser 实例内部会一直处于运行,造成文件不会终止。如果后面还需要用到 [crawlPage](#crawlPage) 或者 [page](#page) 请勿调用。当您修改 browser 实例的属性时,会对该爬虫实例 crawlPage API 内部的 browser 实例和返回结果的 page 实例以及 browser 实例造成影响,因为 browser 实例在同一个爬虫实例的 crawlPage API 内是共享的。
221221
222222
具体使用参考 [browser](https://pptr.dev/api/puppeteer.browser) 。
223223
224-
#### page
224+
#### page 实例
225+
226+
**获取屏幕截图**
227+
228+
```js
229+
import xCrawl from 'x-crawl'
230+
231+
const testXCrawl = xCrawl({ timeout: 10000 })
232+
233+
testXCrawl
234+
.crawlPage('https://xxx.com')
235+
.then(async (res) => {
236+
const { page } = res
237+
238+
await page.screenshot({ path: './upload/page.png' })
239+
240+
console.log('获取屏幕截图完毕')
241+
})
242+
```
225243
226-
page 属性可以做事件之类的交互操作,具体使用参考 [page](https://pptr.dev/api/puppeteer.page) 。
244+
page 实例还可以做事件之类的交互操作,具体使用参考 [page](https://pptr.dev/api/puppeteer.page) 。
227245
228246
### 爬取接口
229247

package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "3.2.3",
4+
"version": "3.2.4",
55
"author": "coderHXL",
66
"description": "x-crawl is a flexible nodejs crawler library. ",
77
"license": "MIT",

publish/README.md

+29-9
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
44

5-
X-Crawl is a flexible Nodejs reptile bank. Used to crawl pages, batch network requests, and download file resources in batches. There are 5 kinds of RequestConfig writing, 3 ways to obtain results, and crawl data asynchronous or synchronized mode. Run on Nodejs and be friendly to JS/TS developers.
5+
x-crawl is a flexible nodejs crawler library. Used to crawl pages, batch network requests, and batch download file resources. Crawl data in asynchronous or synchronous mode, 3 ways to get results, and 5 ways to write requestConfig. Runs on nodejs, friendly to JS/TS developers.
66

77
If you feel good, you can support [x-crawl repository](https://github.com/coder-hxl/x-crawl) with a Star.
88

@@ -37,9 +37,9 @@ We can do the following:
3737
+ [Choose crawling mode](#Choose-crawling-mode)
3838
+ [Multiple crawler application instances](#Multiple-crawler-application-instances)
3939
* [Crawl page](#Crawl-page)
40-
+ [jsdom](#jsdom)
41-
+ [browser](#browser)
42-
+ [page](#page)
40+
+ [jsdom instance](#jsdom-instance)
41+
+ [browser instance](#browser-instance)
42+
+ [page instance](#page-instance)
4343
* [Crawl interface](#Crawl-interface)
4444
* [Crawl files](#Crawl-files)
4545
* [Start polling](#Start-polling)
@@ -212,19 +212,39 @@ myXCrawl.crawlPage('https://xxx.com').then(res => {
212212
})
213213
```
214214

215-
#### jsdom
215+
#### jsdom instance
216216

217217
Refer to [jsdom](https://github.com/jsdom/jsdom) for specific usage.
218218

219-
#### browser
219+
#### browser instance
220220

221-
**Purpose of calling close: **browser will keep running, so the file will not be terminated. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. When you modify the properties of the browser object, it will affect the browser inside the crawlPage of the crawler instance, the returned page, and the browser, because the browser is shared within the crawlPage API of the crawler instance.
221+
The browser instance is a headless browser without a UI shell. What he does is to bring **all modern network platform functions** provided by the browser rendering engine to the code.
222+
223+
**Purpose of calling close:** The browser instance will always be running internally, causing the file not to be terminated. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. When you modify the properties of a browser instance, it will affect the browser instance inside the crawlPage API of the crawler instance, the page instance that returns the result, and the browser instance, because the browser instance is shared within the crawlPage API of the same crawler instance.
222224

223225
Refer to [browser](https://pptr.dev/api/puppeteer.browser) for specific usage.
224226

225-
#### page
227+
#### page instance
228+
229+
**Take Screenshot**
230+
231+
```js
232+
import xCrawl from 'x-crawl'
233+
234+
const testXCrawl = xCrawl({ timeout: 10000 })
235+
236+
testXCrawl
237+
.crawlPage('https://xxx.com')
238+
.then(async (res) => {
239+
const { page } = res
240+
241+
await page.screenshot({ path: './upload/page.png' })
242+
243+
console.log('Screen capture is complete')
244+
})
245+
```
226246

227-
The page attribute can be used for interactive operations such as events. For details, refer to [page](https://pptr.dev/api/puppeteer.page).
247+
The page instance can also perform interactive operations such as events. For details, refer to [page](https://pptr.dev/api/puppeteer.page).
228248

229249
### Crawl interface
230250

publish/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "3.2.3",
3+
"version": "3.2.4",
44
"author": "coderHXL",
55
"description": "x-crawl is a flexible nodejs crawler library.",
66
"license": "MIT",

0 commit comments

Comments
 (0)