Skip to content

Commit 24fd556

Browse files
committed
other
1 parent 46d9733 commit 24fd556

File tree

7 files changed

+73
-70
lines changed

7 files changed

+73
-70
lines changed

README.md

+20-15
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,19 @@ x-crawl is a Nodejs multifunctional crawler library.
66

77
## Feature
88

9-
- Crawl HTML, JSON, file resources, etc. with simple configuration
10-
- Use puppeteer to crawl HTML, and use JSDOM library to parse HTML, or parse HTML by yourself
11-
- Support asynchronous/synchronous way to crawl data
12-
- Support Promise/Callback way to get the result
13-
- Polling function
14-
- Anthropomorphic request interval
15-
- Written in TypeScript, provides generics
9+
- Crawl HTML, JSON, file resources, etc. with simple configuration.
10+
- Built-in puppeteer crawls HTML and uses JSDOM library to parse HTML.
11+
- Support asynchronous/synchronous way to crawl data.
12+
- Support Promise/Callback way to get the result.
13+
- Polling function.
14+
- Anthropomorphic request interval.
15+
- Written in TypeScript, provides generics.
16+
17+
## Benefits provided by using puppeter
18+
19+
- Generate screenshots and PDFs of pages.
20+
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
21+
- Automate form submission, UI testing, keyboard input, etc.
1622

1723
# Table of Contents
1824

@@ -41,14 +47,15 @@ x-crawl is a Nodejs multifunctional crawler library.
4147
* [Method](#Method)
4248
* [RequestConfig](#RequestConfig)
4349
* [IntervalTime](#IntervalTime)
44-
* [FetchBaseConifg](#FetchBaseConifg)
4550
* [XCrawlBaseConifg](#XCrawlBaseConifg)
51+
* [FetchBaseConifgV1](#FetchBaseConifgV1)
52+
* [FetchBaseConifgV2](#FetchBaseConifgV2)
4653
* [FetchHTMLConfig](#FetchHTMLConfig )
47-
* [FetchDataConfig](#FetchDataConfig)
54+
* [FetchDataConfig](#FetchDataConfig)
4855
* [FetchFileConfig](#FetchFileConfig)
4956
* [StartPollingConfig](#StartPollingConfig)
50-
* [FetchCommon](#FetchCommon)
51-
* [FetchCommonArr](#FetchCommonArr)
57+
* [FetchResCommonV1](#FetchResCommonV1)
58+
* [FetchResCommonArrV1](#FetchResCommonArrV1)
5259
* [FileInfo](#FileInfo)
5360
* [FetchHTML](#FetchHTML)
5461
- [More](#More)
@@ -318,7 +325,6 @@ interface FetchBaseConifgV1 {
318325
```ts
319326
interface FetchBaseConifgV2 {
320327
url: string
321-
header?: AnyObject
322328
timeout?: number
323329
proxy?: string
324330
}
@@ -364,7 +370,7 @@ interface StartPollingConfig {
364370
interface FetchCommon<T> {
365371
id: number
366372
statusCode: number | undefined
367-
headers: IncomingHttpHeaders // node: http type
373+
headers: IncomingHttpHeaders // nodejs: http type
368374
data: T
369375
}
370376
```
@@ -392,8 +398,7 @@ interface FileInfo {
392398
interface FetchHTML {
393399
httpResponse: HTTPResponse | null // The type of HTTPResponse in the puppeteer library
394400
data: {
395-
page: Page
396-
content: string
401+
page: Page // The type of Page in the puppeteer library
397402
jsdom: JSDOM // The type of JSDOM in the jsdom library
398403
}
399404
}

docs/cn.md

+24-19
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,19 @@ x-crawl 是 Nodejs 多功能爬虫库。
66

77
## 特征
88

9-
- 只需简单的配置即可抓取 HTML 、JSON、文件资源等等
10-
- 使用 puppeteer 爬取 HTML ,并用 JSDOM 库对 HTML 解析,也可自行解析 HTML
11-
- 支持 异步/同步 方式爬取数据
12-
- 支持 Promise/Callback 方式获取结果
13-
- 轮询功能
14-
- 拟人化的请求间隔时间
15-
- 使用 TypeScript 编写,提供泛型
9+
- 只需简单的配置即可抓取 HTML 、JSON、文件资源等等。
10+
- 内置 puppeteer 爬取 HTML ,并用 JSDOM 库对 HTML 解析。
11+
- 支持 异步/同步 方式爬取数据。
12+
- 支持 Promise/Callback 方式获取结果。
13+
- 轮询功能。
14+
- 拟人化的请求间隔时间。
15+
- 使用 TypeScript 编写,提供泛型。
16+
17+
## 使用 puppeter 提供的好处
18+
19+
- 生成页面的屏幕截图和 PDF。
20+
- 抓取 SPA(单页应用程序)并生成预渲染内容(即“SSR”(服务器端渲染))。
21+
- 自动化表单提交、UI 测试、键盘输入等。
1622

1723
# 目录
1824

@@ -41,14 +47,15 @@ x-crawl 是 Nodejs 多功能爬虫库。
4147
* [Method](#Method)
4248
* [RequestConfig](#RequestConfig)
4349
* [IntervalTime](#IntervalTime)
44-
* [FetchBaseConifg](#FetchBaseConifg)
4550
* [XCrawlBaseConifg](#XCrawlBaseConifg)
51+
* [FetchBaseConifgV1](#FetchBaseConifgV1)
52+
* [FetchBaseConifgV2](#FetchBaseConifgV2)
4653
* [FetchHTMLConfig](#FetchHTMLConfig )
4754
* [FetchDataConfig](#FetchDataConfig)
4855
* [FetchFileConfig](#FetchFileConfig)
49-
* [FetchPollingConfig](#FetchPollingConfig)
50-
* [FetchCommon](#FetchCommon)
51-
* [FetchCommonArr](#FetchCommonArr)
56+
* [StartPollingConfig](#StartPollingConfig)
57+
* [FetchResCommonV1](#FetchResCommonV1)
58+
* [FetchResCommonArrV1](#FetchResCommonArrV1)
5259
* [FileInfo](#FileInfo)
5360
* [FetchHTML](#FetchHTML)
5461
- [更多](#更多)
@@ -63,7 +70,7 @@ npm install x-crawl
6370

6471
## 示例
6572

66-
每隔一天就获取 bilibili 国漫主页的推荐轮播图片为例:
73+
每隔一天就获取 bilibili 国漫主页的轮播图片为例:
6774

6875
```js
6976
// 1.导入模块 ES/CJS
@@ -76,14 +83,14 @@ const myXCrawl = xCrawl({
7683
})
7784
7885
// 3.设置爬取任务
79-
// 调用 fetchPolling API 开始轮询功能,每隔一天会调用回调函数
80-
myXCrawl.fetchPolling({ d: 1 }, () => {
86+
// 调用 startPolling API 开始轮询功能,每隔一天会调用回调函数
87+
myXCrawl.startPolling({ d: 1 }, () => {
8188
// 调用 fetchHTML API 爬取 HTML
8289
myXCrawl.fetchHTML('https://www.bilibili.com/guochuang/').then((res) => {
8390
const { jsdom } = res.data // 默认使用了 JSDOM 库解析 HTML
8491
8592
// 获取轮播图片元素
86-
const imgEls = jsdom.window.document.querySelectorAll('.chief-recom-item img')
93+
const imgEls = jsdom.window.document.querySelectorAll('.carousel-wrapper .chief-recom-item img')
8794
8895
// 设置请求配置
8996
const requestConifg = []
@@ -342,7 +349,6 @@ interface FetchBaseConifgV1 {
342349
```ts
343350
interface FetchBaseConifgV2 {
344351
url: string
345-
header?: AnyObject
346352
timeout?: number
347353
proxy?: string
348354
}
@@ -388,7 +394,7 @@ interface StartPollingConfig {
388394
interface FetchCommon<T> {
389395
id: number
390396
statusCode: number | undefined
391-
headers: IncomingHttpHeaders // node: http 类型
397+
headers: IncomingHttpHeaders // nodejs: http 类型
392398
data: T
393399
}
394400
```
@@ -416,8 +422,7 @@ interface FileInfo {
416422
interface FetchHTML {
417423
httpResponse: HTTPResponse | null // puppeteer 库的 HTTPResponse 类型
418424
data: {
419-
page: Page
420-
content: string
425+
page: Page // puppeteer 库的 Page 类型
421426
jsdom: JSDOM
422427
}
423428
}

package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "2.0.0",
4+
"version": "2.1.0",
55
"author": "coderHXL",
66
"description": "XCrawl is a Nodejs multifunctional crawler library.",
77
"license": "MIT",

publish/README.md

+20-15
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,19 @@ x-crawl is a Nodejs multifunctional crawler library.
66

77
## Feature
88

9-
- Crawl HTML, JSON, file resources, etc. with simple configuration
10-
- Use puppeteer to crawl HTML, and use JSDOM library to parse HTML, or parse HTML by yourself
11-
- Support asynchronous/synchronous way to crawl data
12-
- Support Promise/Callback way to get the result
13-
- Polling function
14-
- Anthropomorphic request interval
15-
- Written in TypeScript, provides generics
9+
- Crawl HTML, JSON, file resources, etc. with simple configuration.
10+
- Built-in puppeteer crawls HTML and uses JSDOM library to parse HTML.
11+
- Support asynchronous/synchronous way to crawl data.
12+
- Support Promise/Callback way to get the result.
13+
- Polling function.
14+
- Anthropomorphic request interval.
15+
- Written in TypeScript, provides generics.
16+
17+
## Benefits provided by using puppeter
18+
19+
- Generate screenshots and PDFs of pages.
20+
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
21+
- Automate form submission, UI testing, keyboard input, etc.
1622

1723
# Table of Contents
1824

@@ -41,14 +47,15 @@ x-crawl is a Nodejs multifunctional crawler library.
4147
* [Method](#Method)
4248
* [RequestConfig](#RequestConfig)
4349
* [IntervalTime](#IntervalTime)
44-
* [FetchBaseConifg](#FetchBaseConifg)
4550
* [XCrawlBaseConifg](#XCrawlBaseConifg)
51+
* [FetchBaseConifgV1](#FetchBaseConifgV1)
52+
* [FetchBaseConifgV2](#FetchBaseConifgV2)
4653
* [FetchHTMLConfig](#FetchHTMLConfig )
47-
* [FetchDataConfig](#FetchDataConfig)
54+
* [FetchDataConfig](#FetchDataConfig)
4855
* [FetchFileConfig](#FetchFileConfig)
4956
* [StartPollingConfig](#StartPollingConfig)
50-
* [FetchCommon](#FetchCommon)
51-
* [FetchCommonArr](#FetchCommonArr)
57+
* [FetchResCommonV1](#FetchResCommonV1)
58+
* [FetchResCommonArrV1](#FetchResCommonArrV1)
5259
* [FileInfo](#FileInfo)
5360
* [FetchHTML](#FetchHTML)
5461
- [More](#More)
@@ -318,7 +325,6 @@ interface FetchBaseConifgV1 {
318325
```ts
319326
interface FetchBaseConifgV2 {
320327
url: string
321-
header?: AnyObject
322328
timeout?: number
323329
proxy?: string
324330
}
@@ -364,7 +370,7 @@ interface StartPollingConfig {
364370
interface FetchCommon<T> {
365371
id: number
366372
statusCode: number | undefined
367-
headers: IncomingHttpHeaders // node: http type
373+
headers: IncomingHttpHeaders // nodejs: http type
368374
data: T
369375
}
370376
```
@@ -392,8 +398,7 @@ interface FileInfo {
392398
interface FetchHTML {
393399
httpResponse: HTTPResponse | null // The type of HTTPResponse in the puppeteer library
394400
data: {
395-
page: Page
396-
content: string
401+
page: Page // The type of Page in the puppeteer library
397402
jsdom: JSDOM // The type of JSDOM in the jsdom library
398403
}
399404
}

publish/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "2.0.0",
3+
"version": "2.1.0",
44
"author": "coderHXL",
55
"description": "XCrawl is a Nodejs multifunctional crawler library.",
66
"license": "MIT",

0 commit comments

Comments
 (0)