Skip to content

Commit b16cb70

Browse files
committed
Update: Docs
1 parent 00aed9b commit b16cb70

File tree

5 files changed

+87
-93
lines changed

5 files changed

+87
-93
lines changed

README.md

+32-33
Original file line numberDiff line numberDiff line change
@@ -41,20 +41,20 @@ The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/p
4141
* [Multiple ways of writing requestConfig options](#Multiple-ways-of-writing-requestConfig-options)
4242
* [Multiple ways to get results](#Multiple-ways-to-get-results)
4343
- [API](#API)
44-
* [x-crawl](#x-crawl-2)
45-
+ [Type](#Type-1)
44+
* [xCrawl](#xCrawl)
45+
+ [Type](#Type)
4646
+ [Example](#Example-1)
4747
* [crawlPage](#crawlPage)
48-
+ [Type](#Type-2)
48+
+ [Type](#Type-1)
4949
+ [Example](#Example-2)
5050
* [crawlData](#crawlData)
51-
+ [Type](#Type-3)
51+
+ [Type](#Type-2)
5252
+ [Example](#Example-3)
5353
* [crawlFile](#crawlFile)
54-
+ [Type](#Type-4)
54+
+ [Type](#Type-3)
5555
+ [Example](#Example-4)
5656
* [crawlPolling](#crawlPolling)
57-
+ [Type](#Type-5)
57+
+ [Type](#Type-4)
5858
+ [Example](#Example-5)
5959
- [Types](#Types)
6060
* [AnyObject](#AnyObject)
@@ -64,14 +64,14 @@ The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/p
6464
* [RequestConfig](#RequestConfig)
6565
* [IntervalTime](#IntervalTime)
6666
* [XCrawlBaseConfig](#XCrawlBaseConfig)
67-
* [CrawlPageConfig](#CrawlPageConfig )
67+
* [CrawlPageConfig](#CrawlPageConfig)
6868
* [CrawlBaseConfigV1](#CrawlBaseConfigV1)
6969
* [CrawlDataConfig](#CrawlDataConfig)
7070
* [CrawlFileConfig](#CrawlFileConfig)
7171
* [StartPollingConfig](#StartPollingConfig)
7272
* [CrawlResCommonV1](#CrawlResCommonV1)
7373
* [CrawlResCommonArrV1](#CrawlResCommonArrV1)
74-
* [CrawlPage](#CrawlPage-2)
74+
* [CrawlPage](#CrawlPage-1)
7575
* [FileInfo](#FileInfo)
7676
- [More](#More)
7777

@@ -98,23 +98,25 @@ const myXCrawl = xCrawl({
9898
})
9999

100100
// 3.Set the crawling task
101-
// Call the startPolling API to start the polling function, and the callback function will be called every other day
102-
myXCrawl.startPolling({ d: 1 }, (count, stopPolling) => {
103-
myXCrawl.crawlPage('https://zh.airbnb.com/s/*/plus_homes').then((res) => {
104-
const { jsdom } = res // By default, the JSDOM library is used to parse Page
105-
106-
// Get the cover image elements for Plus listings
107-
const imgEls = jsdom.window.document
108-
.querySelector('.a1stauiv')
109-
?.querySelectorAll('picture img')
110-
111-
// set request configuration
112-
const requestConfig: string[] = []
113-
imgEls?.forEach((item) => requestConfig.push(item.src))
114-
115-
// Call the crawlFile API to crawl pictures
116-
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
117-
})
101+
/*
102+
Call the startPolling API to start the polling function,
103+
and the callback function will be called every other day
104+
*/
105+
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
106+
// Call crawlPage API to crawl Page
107+
const { jsdom } = await myXCrawl.crawlPage('https://zh.airbnb.com/s/*/plus_homes')
108+
109+
// Get the cover image elements for Plus listings
110+
const imgEls = jsdom.window.document
111+
.querySelector('.a1stauiv')
112+
?.querySelectorAll('picture img')
113+
114+
// set request configuration
115+
const requestConfig: string[] = []
116+
imgEls?.forEach((item) => requestConfig.push(item.src))
117+
118+
// Call the crawlFile API to crawl pictures
119+
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
118120
})
119121
```
120122
@@ -136,7 +138,7 @@ running result:
136138
137139
#### An example of a crawler application
138140
139-
Create a new **application instance** via [xCrawl()](#x-crawl-2):
141+
Create a new **application instance** via [xCrawl()](#xCrawl):
140142
141143
```js
142144
import xCrawl from 'x-crawl'
@@ -321,13 +323,10 @@ const myXCrawl = xCrawl({
321323
intervalTime: { max: 3000, min: 1000 }
322324
})
323325

324-
myXCrawl. startPolling({ h: 2, m: 30 }, (count, stopPolling) => {
326+
myXCrawl. startPolling({ h: 2, m: 30 }, async (count, stopPolling) => {
325327
// will be executed every two and a half hours
326328
// crawlPage/crawlData/crawlFile
327-
myXCrawl.crawlPage('https://xxx.com').then(res => {
328-
const { jsdom, browser, page } = res
329-
330-
})
329+
const { jsdom, browser, page } = await myXCrawl.crawlPage('https://xxx.com')
331330
})
332331
```
333332
@@ -476,7 +475,7 @@ It can be selected according to the actual situation.
476475
477476
## API
478477
479-
### x-crawl
478+
### xCrawl
480479
481480
Create a crawler instance via call xCrawl. The request queue is maintained by the instance method itself, not by the instance itself.
482481
@@ -515,7 +514,7 @@ crawlPage is the method of the crawler instance, usually used to crawl page.
515514
#### Type
516515
517516
- Look at the [CrawlPageConfig](#CrawlPageConfig) type
518-
- Look at the [CrawlPage](#CrawlPage-2) type
517+
- Look at the [CrawlPage](#CrawlPage-1) type
519518
520519
```ts
521520
function crawlPage: (

docs/cn.md

+21-25
Original file line numberDiff line numberDiff line change
@@ -42,36 +42,36 @@ crawlPage API 内部使用 [puppeteer](https://github.com/puppeteer/puppeteer)
4242
* [获取结果的多种方式](#获取结果的多种方式)
4343
- [API](#API)
4444
* [xCrawl](#xCrawl)
45-
+ [类型](#类型-1)
45+
+ [类型](#类型)
4646
+ [示例](#示例-1)
4747
* [crawlPage](#crawlPage)
48-
+ [类型](#类型-2)
48+
+ [类型](#类型-1)
4949
+ [示例](#示例-2)
5050
* [crawlData](#crawlData)
51-
+ [类型](#类型-3)
51+
+ [类型](#类型-2)
5252
+ [示例](#示例-3)
5353
* [crawlFile](#crawlFile)
54-
+ [类型](#类型-4)
54+
+ [类型](#类型-3)
5555
+ [示例](#示例-4)
5656
* [startPolling](#startPolling)
57-
+ [类型](#类型-5)
57+
+ [类型](#类型-4)
5858
+ [示例](#示例-5)
59-
- [类型](#类型-6)
59+
- [类型](#类型-5)
6060
* [AnyObject](#AnyObject)
6161
* [Method](#Method)
6262
* [RequestConfigObjectV1](#RequestConfigObjectV1)
6363
* [RequestConfigObjectV2](#RequestConfigObjectV2)
6464
* [RequestConfig](#RequestConfig)
6565
* [IntervalTime](#IntervalTime)
6666
* [XCrawlBaseConfig](#XCrawlBaseConfig)
67-
* [CrawlPageConfig](#CrawlPageConfig )
67+
* [CrawlPageConfig](#CrawlPageConfig)
6868
* [CrawlBaseConfigV1](#CrawlBaseConfigV1)
6969
* [CrawlDataConfig](#CrawlDataConfig)
7070
* [CrawlFileConfig](#CrawlFileConfig)
7171
* [StartPollingConfig](#StartPollingConfig)
7272
* [CrawlResCommonV1](#CrawlResCommonV1)
7373
* [CrawlResCommonArrV1](#CrawlResCommonArrV1)
74-
* [CrawlPage](#CrawlPage-2)
74+
* [CrawlPage](#CrawlPage-1)
7575
* [FileInfo](#FileInfo)
7676
- [更多](#更多)
7777

@@ -85,7 +85,7 @@ npm install x-crawl
8585

8686
## 示例
8787

88-
定时爬取: 每隔一天就获取 bilibili 国漫主页的轮播图片为例:
88+
每天自动获取 bilibili 国漫主页的轮播图片为例:
8989

9090
```js
9191
// 1.导入模块 ES/CJS
@@ -99,21 +99,19 @@ const myXCrawl = xCrawl({
9999
100100
// 3.设置爬取任务
101101
// 调用 startPolling API 开始轮询功能,每隔一天会调用回调函数
102-
myXCrawl.startPolling({ d: 1 }, () => {
102+
myXCrawl.startPolling({ d: 1 }, async () => {
103103
// 调用 crawlPage API 爬取 Page
104-
myXCrawl.crawlPage('https://www.bilibili.com/guochuang/').then((res) => {
105-
const { jsdom } = res // 默认使用了 JSDOM 库解析 Page
104+
const { jsdom } = await myXCrawl.crawlPage('https://www.bilibili.com/guochuang/')
106105
107-
// 获取轮播图片元素
108-
const imgEls = jsdom.window.document.querySelectorAll('.chief-recom-item img')
106+
// 获取轮播图片元素
107+
const imgEls = jsdom.window.document.querySelectorAll('.chief-recom-item img')
109108
110-
// 设置请求配置
111-
const requestConfig = []
112-
imgEls.forEach((item) => requestConfig.push(`https:${item.src}`))
109+
// 设置请求配置
110+
const requestConfig = []
111+
imgEls.forEach((item) => requestConfig.push(`https:${item.src}`))
113112
114-
// 调用 crawlFile API 爬取图片
115-
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
116-
})
113+
// 调用 crawlFile API 爬取图片
114+
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
117115
})
118116
```
119117
@@ -319,12 +317,10 @@ const myXCrawl = xCrawl({
319317
timeout: 10000
320318
})
321319
322-
myXCrawl.startPolling({ h: 2, m: 30 }, (count, stopPolling) => {
320+
myXCrawl.startPolling({ h: 2, m: 30 }, async (count, stopPolling) => {
323321
// 每隔两个半小时会执行一次
324322
// crawlPage/crawlData/crawlFile
325-
myXCrawl.crawlPage('https://xxx.com').then(res => {
326-
const { jsdom, browser, page } = res
327-
})
323+
const { jsdom, browser, page } = await myXCrawl.crawlPage('https://xxx.com')
328324
})
329325
```
330326
@@ -511,7 +507,7 @@ crawlPage 是爬虫实例的方法,通常用于爬取页面。
511507
#### 类型
512508
513509
- 查看 [CrawlPageConfig](#CrawlPageConfig) 类型
514-
- 查看 [CrawlPage](#CrawlPage-2) 类型
510+
- 查看 [CrawlPage](#CrawlPage-1) 类型
515511
516512
```ts
517513
function crawlPage: (

package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "3.2.11",
4+
"version": "3.2.12",
55
"author": "coderHXL",
66
"description": "x-crawl is a flexible nodejs crawler library.",
77
"license": "MIT",

publish/README.md

+32-33
Original file line numberDiff line numberDiff line change
@@ -41,20 +41,20 @@ The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/p
4141
* [Multiple ways of writing requestConfig options](#Multiple-ways-of-writing-requestConfig-options)
4242
* [Multiple ways to get results](#Multiple-ways-to-get-results)
4343
- [API](#API)
44-
* [x-crawl](#x-crawl-2)
45-
+ [Type](#Type-1)
44+
* [xCrawl](#xCrawl)
45+
+ [Type](#Type)
4646
+ [Example](#Example-1)
4747
* [crawlPage](#crawlPage)
48-
+ [Type](#Type-2)
48+
+ [Type](#Type-1)
4949
+ [Example](#Example-2)
5050
* [crawlData](#crawlData)
51-
+ [Type](#Type-3)
51+
+ [Type](#Type-2)
5252
+ [Example](#Example-3)
5353
* [crawlFile](#crawlFile)
54-
+ [Type](#Type-4)
54+
+ [Type](#Type-3)
5555
+ [Example](#Example-4)
5656
* [crawlPolling](#crawlPolling)
57-
+ [Type](#Type-5)
57+
+ [Type](#Type-4)
5858
+ [Example](#Example-5)
5959
- [Types](#Types)
6060
* [AnyObject](#AnyObject)
@@ -64,14 +64,14 @@ The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/p
6464
* [RequestConfig](#RequestConfig)
6565
* [IntervalTime](#IntervalTime)
6666
* [XCrawlBaseConfig](#XCrawlBaseConfig)
67-
* [CrawlPageConfig](#CrawlPageConfig )
67+
* [CrawlPageConfig](#CrawlPageConfig)
6868
* [CrawlBaseConfigV1](#CrawlBaseConfigV1)
6969
* [CrawlDataConfig](#CrawlDataConfig)
7070
* [CrawlFileConfig](#CrawlFileConfig)
7171
* [StartPollingConfig](#StartPollingConfig)
7272
* [CrawlResCommonV1](#CrawlResCommonV1)
7373
* [CrawlResCommonArrV1](#CrawlResCommonArrV1)
74-
* [CrawlPage](#CrawlPage-2)
74+
* [CrawlPage](#CrawlPage-1)
7575
* [FileInfo](#FileInfo)
7676
- [More](#More)
7777

@@ -98,23 +98,25 @@ const myXCrawl = xCrawl({
9898
})
9999

100100
// 3.Set the crawling task
101-
// Call the startPolling API to start the polling function, and the callback function will be called every other day
102-
myXCrawl.startPolling({ d: 1 }, (count, stopPolling) => {
103-
myXCrawl.crawlPage('https://zh.airbnb.com/s/*/plus_homes').then((res) => {
104-
const { jsdom } = res // By default, the JSDOM library is used to parse Page
105-
106-
// Get the cover image elements for Plus listings
107-
const imgEls = jsdom.window.document
108-
.querySelector('.a1stauiv')
109-
?.querySelectorAll('picture img')
110-
111-
// set request configuration
112-
const requestConfig: string[] = []
113-
imgEls?.forEach((item) => requestConfig.push(item.src))
114-
115-
// Call the crawlFile API to crawl pictures
116-
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
117-
})
101+
/*
102+
Call the startPolling API to start the polling function,
103+
and the callback function will be called every other day
104+
*/
105+
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
106+
// Call crawlPage API to crawl Page
107+
const { jsdom } = await myXCrawl.crawlPage('https://zh.airbnb.com/s/*/plus_homes')
108+
109+
// Get the cover image elements for Plus listings
110+
const imgEls = jsdom.window.document
111+
.querySelector('.a1stauiv')
112+
?.querySelectorAll('picture img')
113+
114+
// set request configuration
115+
const requestConfig: string[] = []
116+
imgEls?.forEach((item) => requestConfig.push(item.src))
117+
118+
// Call the crawlFile API to crawl pictures
119+
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
118120
})
119121
```
120122
@@ -136,7 +138,7 @@ running result:
136138
137139
#### An example of a crawler application
138140
139-
Create a new **application instance** via [xCrawl()](#x-crawl-2):
141+
Create a new **application instance** via [xCrawl()](#xCrawl):
140142
141143
```js
142144
import xCrawl from 'x-crawl'
@@ -321,13 +323,10 @@ const myXCrawl = xCrawl({
321323
intervalTime: { max: 3000, min: 1000 }
322324
})
323325

324-
myXCrawl. startPolling({ h: 2, m: 30 }, (count, stopPolling) => {
326+
myXCrawl. startPolling({ h: 2, m: 30 }, async (count, stopPolling) => {
325327
// will be executed every two and a half hours
326328
// crawlPage/crawlData/crawlFile
327-
myXCrawl.crawlPage('https://xxx.com').then(res => {
328-
const { jsdom, browser, page } = res
329-
330-
})
329+
const { jsdom, browser, page } = await myXCrawl.crawlPage('https://xxx.com')
331330
})
332331
```
333332
@@ -476,7 +475,7 @@ It can be selected according to the actual situation.
476475
477476
## API
478477
479-
### x-crawl
478+
### xCrawl
480479
481480
Create a crawler instance via call xCrawl. The request queue is maintained by the instance method itself, not by the instance itself.
482481
@@ -515,7 +514,7 @@ crawlPage is the method of the crawler instance, usually used to crawl page.
515514
#### Type
516515
517516
- Look at the [CrawlPageConfig](#CrawlPageConfig) type
518-
- Look at the [CrawlPage](#CrawlPage-2) type
517+
- Look at the [CrawlPage](#CrawlPage-1) type
519518
520519
```ts
521520
function crawlPage: (

publish/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "3.2.11",
3+
"version": "3.2.12",
44
"author": "coderHXL",
55
"description": "x-crawl is a flexible nodejs crawler library.",
66
"license": "MIT",

0 commit comments

Comments
 (0)