Skip to content

Commit bb46412

Browse files
committed
Update: Docs
1 parent 921e136 commit bb46412

File tree

5 files changed

+59
-62
lines changed

5 files changed

+59
-62
lines changed

README.md

+19-20
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,26 @@
1-
# x-crawl
1+
# x-crawl [![npm](https://img.shields.io/npm/v/x-crawl.svg)](https://www.npmjs.com/package/x-crawl) [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/coder-hxl/x-crawl/blob/main/LICENSE)
22

33
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
44

55
x-crawl is a flexible nodejs crawler library. You can crawl pages and control operations such as pages, batch network requests, and batch downloads of file resources. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
66

7-
If you feel good, you can support [x-crawl repository](https://github.com/coder-hxl/x-crawl) with a Star.
7+
If you feel good, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a Star to support it, your Star will be the motivation for my update.
88

99
## Features
1010

11-
- Cules data for asynchronous/synchronous ways.
12-
- In three ways to obtain the results of the three ways of supporting Promise, Callback, and Promise + Callback.
13-
- RquestConfig has 5 ways of writing.
14-
- Flexible request interval.
15-
- Operations such as crawling pages, batch network requests, and batch downloading of file resources can be performed with simple configuration.
16-
- The rotation function, crawl regularly.
17-
- The built -in Puppeteer crawl the page and uses the JSDOM library to analyze the page, or it can also be parsed by itself.
18-
- Chopening with TypeScript, possessing type prompts, and providing generic types.
11+
- Support asynchronous/synchronous way to crawl data.
12+
- The writing method is very flexible and supports multiple ways to write request configuration and obtain crawling results.
13+
- Flexible crawling interval, up to you to use/avoid high concurrent crawling.
14+
- With simple configuration, operations such as crawling pages, batch network requests, and batch download of file resources can be performed.
15+
- Possess polling function to crawl data regularly.
16+
- The built-in puppeteer crawls the page, and uses the jsdom library to analyze the content of the page, and also supports self-analysis.
17+
- Written in TypeScript, has types, provides generics.
1918

2019
## Relationship with puppeteer
2120

2221
The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/puppeteer) library to help us crawl pages.
2322

24-
We can do the following:
23+
The return value of the crawlPage API will be able to do the following:
2524

2625
- Generate screenshots and PDFs of pages.
2726
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
@@ -43,7 +42,7 @@ We can do the following:
4342
* [Crawl interface](#Crawl-interface)
4443
* [Crawl files](#Crawl-files)
4544
* [Start polling](#Start-polling)
46-
* [Request interval time](#Request-interval-time)
45+
* [Crawl interval](#Crawl-interval)
4746
* [Multiple ways of writing requestConfig options](#Multiple-ways-of-writing-requestConfig-options)
4847
* [Multiple ways to get results](#Multiple-ways-to-get-results)
4948
- [API](#API)
@@ -101,7 +100,7 @@ import xCrawl from 'x-crawl'
101100
// 2.Create a crawler instance
102101
const myXCrawl = xCrawl({
103102
timeout: 10000, // overtime time
104-
intervalTime: { max: 3000, min: 2000 } // control request frequency
103+
intervalTime: { max: 3000, min: 2000 } // crawl interval
105104
})
106105

107106
// 3.Set the crawling task
@@ -195,7 +194,7 @@ const myXCrawl2 = xCrawl({
195194

196195
### Crawl page
197196

198-
Crawl a page via [crawlPage()](#crawlPage)
197+
Crawl a page via [crawlPage()](#crawlPage) .
199198

200199
```js
201200
import xCrawl from 'x-crawl'
@@ -274,7 +273,7 @@ myXCrawl
274273

275274
### Crawl interface
276275

277-
Crawl interface data through [crawlData()](#crawlData)
276+
Crawl interface data through [crawlData()](#crawlData) .
278277

279278
```js
280279
import xCrawl from 'x-crawl'
@@ -297,7 +296,7 @@ myXCrawl.crawlData({ requestConfig }).then(res => {
297296

298297
### Crawl files
299298

300-
Crawl file data via [crawlFile()](#crawlFile)
299+
Crawl file data via [crawlFile()](#crawlFile) .
301300

302301
```js
303302
import xCrawl from 'x-crawl'
@@ -323,7 +322,7 @@ myXCrawl
323322

324323
### Start polling
325324

326-
Start a polling crawl with [startPolling](#startPolling)
325+
Start a polling crawl with [startPolling()](#startPolling) .
327326

328327
```js
329328
import xCrawl from 'x-crawl'
@@ -348,11 +347,11 @@ Callback function parameters:
348347
- The count attribute records the current number of polling operations.
349348
- stopPolling is a callback function, calling it can terminate subsequent polling operations.
350349

351-
### Request interval time
350+
### Crawl interval
352351

353352
Setting the requests interval time can prevent too much concurrency and avoid too much pressure on the server.
354353

355-
It can be set when creating a crawler instance, or you can choose to set it separately for an API. The request interval time is controlled internally by the instance method, not by the instance to control the entire request interval time.
354+
It can be set when creating a crawler instance, or you can choose to set it separately for an API. The crawl interval is controlled internally by the instance method, not by the instance to control the entire crawl interval.
356355

357356
```js
358357
import xCrawl from 'x-crawl'
@@ -510,7 +509,7 @@ import xCrawl from 'x-crawl'
510509
const myXCrawl = xCrawl({
511510
baseUrl: 'https://xxx.com',
512511
timeout: 10000,
513-
// The interval between requests, multiple requests are valid
512+
// Crawling interval time, batch crawling is only valid
514513
intervalTime: {
515514
max: 2000,
516515
min: 1000

docs/cn.md

+19-20
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,26 @@
1-
# x-crawl
1+
# x-crawl [![npm](https://img.shields.io/npm/v/x-crawl.svg)](https://www.npmjs.com/package/x-crawl) [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/coder-hxl/x-crawl/blob/main/LICENSE)
22

33
[English](https://github.com/coder-hxl/x-crawl#x-crawl) | 简体中文
44

55
x-crawl 是一个灵活的 nodejs 爬虫库。可以爬取页面并控制页面、批量网络请求以及批量下载文件资源等操作。支持 异步/同步 模式爬取数据。跑在 nodejs 上,用法灵活和简单,对 JS/TS 开发者友好。
66

7-
如果感觉不错,可以给 [x-crawl 存储库](https://github.com/coder-hxl/x-crawl) 点个 Star 支持一下。
7+
如果感觉不错,可以给 [x-crawl 存储库](https://github.com/coder-hxl/x-crawl) 点个 Star 支持一下,您的 Star 将是我更新的动力
88

99
## 特征
1010

1111
- 支持 异步/同步 方式爬取数据。
12-
- 支持 Promise、Callback 以及 Promise + Callback 这 3 种方式获取结果。
13-
- requestConfig 拥有 5 种写法。
14-
- 灵活的请求间隔时间。
15-
- 只需简单的配置即可抓取页面、批量网络请求以及批量下载文件资源等操作。
16-
- 轮询功能,定时爬取。
17-
- 内置 puppeteer 爬取页面 ,并用采用 jsdom 库对页面解析,也可自行解析。
18-
- 使用 TypeScript 编写,拥有类型提示,提供泛型。
12+
- 写法非常灵活,支持多种方式写请求配置和获取爬取结果。
13+
- 灵活的爬取间隔时间,由你决定 使用/避免 高并发爬取。
14+
- 简单的配置即可抓取页面、批量网络请求以及批量下载文件资源等操作。
15+
- 拥有轮询功能,定时爬取数据。
16+
- 内置 puppeteer 爬取页面,并用采用 jsdom 库对页面内容解析,也支持自行解析。
17+
- 使用 TypeScript 编写,拥有类型,提供泛型。
1918

2019
## 跟 puppeteer 的关系
2120

2221
crawlPage API 内部使用 [puppeteer](https://github.com/puppeteer/puppeteer) 库来帮助我们爬取页面。
2322

24-
我们可以做以下操作:
23+
crawlPage API 的返回值将可以做以下操作:
2524

2625
- 生成页面的屏幕截图和 PDF。
2726
- 抓取 SPA(单页应用程序)并生成预渲染内容(即“SSR”(服务器端渲染))。
@@ -43,7 +42,7 @@ crawlPage API 内部使用 [puppeteer](https://github.com/puppeteer/puppeteer)
4342
* [爬取接口](#爬取接口)
4443
* [爬取文件](#爬取文件)
4544
* [启动轮询](#启动轮询)
46-
* [请求间隔时间](#请求间隔时间)
45+
* [爬取间隔时间](#爬取间隔时间)
4746
* [requestConfig 选项的多种写法](#requestConfig-选项的多种写法)
4847
* [获取结果的多种方式](#获取结果的多种方式)
4948
- [API](#API)
@@ -100,7 +99,7 @@ import xCrawl from 'x-crawl'
10099
// 2.创建一个爬虫实例
101100
const myXCrawl = xCrawl({
102101
timeout: 10000, // 请求超时时间
103-
intervalTime: { max: 3000, min: 2000 } // 控制请求频率
102+
intervalTime: { max: 3000, min: 2000 } // 爬取间隔时间
104103
})
105104
106105
// 3.设置爬取任务
@@ -189,7 +188,7 @@ const myXCrawl2 = xCrawl({
189188
190189
### 爬取页面
191190
192-
通过 [crawlPage()](#crawlPage) 爬取一个页面
191+
通过 [crawlPage()](#crawlPage) 爬取一个页面
193192
194193
```js
195194
import xCrawl from 'x-crawl'
@@ -266,7 +265,7 @@ myXCrawl
266265
267266
### 爬取接口
268267
269-
通过 [crawlData()](#crawlData) 爬取接口数据
268+
通过 [crawlData()](#crawlData) 爬取接口数据
270269
271270
```js
272271
import xCrawl from 'x-crawl'
@@ -289,7 +288,7 @@ myXCrawl.crawlData({ requestConfig }).then(res => {
289288
290289
### 爬取文件
291290
292-
通过 [crawlFile()](#crawlFile) 爬取文件数据
291+
通过 [crawlFile()](#crawlFile) 爬取文件数据
293292
294293
```js
295294
import xCrawl from 'x-crawl'
@@ -316,7 +315,7 @@ myXCrawl
316315
317316
### 启动轮询
318317
319-
通过 [startPolling](#startPolling) 启动一个轮询爬取
318+
通过 [startPolling()](#startPolling) 启动一个轮询爬取
320319
321320
```js
322321
import xCrawl from 'x-crawl'
@@ -339,11 +338,11 @@ myXCrawl.startPolling({ h: 2, m: 30 }, (count, stopPolling) => {
339338
- count 属性记录当前是第几次轮询操作。
340339
- stopPolling 是一个回调函数,调用其可以终止后面的轮询操作。
341340
342-
### 请求间隔时间
341+
### 爬取间隔时间
343342
344-
设置请求间隔时间可以防止并发量太大,避免给服务器造成太大的压力。
343+
设置爬取间隔时间可以防止并发量太大,避免给服务器造成太大的压力。
345344
346-
可以在创建爬虫实例的时候设置,也可选择给某个 API 单独设置。请求的间隔时间是由实例方法内部控制的,并非由实例控制整个请求的间隔时间
345+
可以在创建爬虫实例的时候设置,也可选择给某个 API 单独设置。爬取间隔时间是由实例方法内部控制的,并非由实例控制整个爬取间隔时间
347346
348347
```js
349348
import xCrawl from 'x-crawl'
@@ -502,7 +501,7 @@ import xCrawl from 'x-crawl'
502501
const myXCrawl = xCrawl({
503502
baseUrl: 'https://xxx.com',
504503
timeout: 10000,
505-
// 请求的间隔时间, 多个请求才有效
504+
// 爬取间隔时间, 批量爬取才有效
506505
intervalTime: {
507506
max: 2000,
508507
min: 1000

package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "3.2.5",
4+
"version": "3.2.6",
55
"author": "coderHXL",
66
"description": "x-crawl is a flexible nodejs crawler library.",
77
"license": "MIT",

publish/README.md

+19-20
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,26 @@
1-
# x-crawl
1+
# x-crawl [![npm](https://img.shields.io/npm/v/x-crawl.svg)](https://www.npmjs.com/package/x-crawl) [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/coder-hxl/x-crawl/blob/main/LICENSE)
22

33
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
44

55
x-crawl is a flexible nodejs crawler library. You can crawl pages and control operations such as pages, batch network requests, and batch downloads of file resources. Support asynchronous/synchronous mode crawling data. Running on nodejs, the usage is flexible and simple, friendly to JS/TS developers.
66

7-
If you feel good, you can support [x-crawl repository](https://github.com/coder-hxl/x-crawl) with a Star.
7+
If you feel good, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a Star to support it, your Star will be the motivation for my update.
88

99
## Features
1010

11-
- Cules data for asynchronous/synchronous ways.
12-
- In three ways to obtain the results of the three ways of supporting Promise, Callback, and Promise + Callback.
13-
- RquestConfig has 5 ways of writing.
14-
- Flexible request interval.
15-
- Operations such as crawling pages, batch network requests, and batch downloading of file resources can be performed with simple configuration.
16-
- The rotation function, crawl regularly.
17-
- The built -in Puppeteer crawl the page and uses the JSDOM library to analyze the page, or it can also be parsed by itself.
18-
- Chopening with TypeScript, possessing type prompts, and providing generic types.
11+
- Support asynchronous/synchronous way to crawl data.
12+
- The writing method is very flexible and supports multiple ways to write request configuration and obtain crawling results.
13+
- Flexible crawling interval, up to you to use/avoid high concurrent crawling.
14+
- With simple configuration, operations such as crawling pages, batch network requests, and batch download of file resources can be performed.
15+
- Possess polling function to crawl data regularly.
16+
- The built-in puppeteer crawls the page, and uses the jsdom library to analyze the content of the page, and also supports self-analysis.
17+
- Written in TypeScript, has types, provides generics.
1918

2019
## Relationship with puppeteer
2120

2221
The crawlPage API internally uses the [puppeteer](https://github.com/puppeteer/puppeteer) library to help us crawl pages.
2322

24-
We can do the following:
23+
The return value of the crawlPage API will be able to do the following:
2524

2625
- Generate screenshots and PDFs of pages.
2726
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
@@ -43,7 +42,7 @@ We can do the following:
4342
* [Crawl interface](#Crawl-interface)
4443
* [Crawl files](#Crawl-files)
4544
* [Start polling](#Start-polling)
46-
* [Request interval time](#Request-interval-time)
45+
* [Crawl interval](#Crawl-interval)
4746
* [Multiple ways of writing requestConfig options](#Multiple-ways-of-writing-requestConfig-options)
4847
* [Multiple ways to get results](#Multiple-ways-to-get-results)
4948
- [API](#API)
@@ -101,7 +100,7 @@ import xCrawl from 'x-crawl'
101100
// 2.Create a crawler instance
102101
const myXCrawl = xCrawl({
103102
timeout: 10000, // overtime time
104-
intervalTime: { max: 3000, min: 2000 } // control request frequency
103+
intervalTime: { max: 3000, min: 2000 } // crawl interval
105104
})
106105

107106
// 3.Set the crawling task
@@ -195,7 +194,7 @@ const myXCrawl2 = xCrawl({
195194

196195
### Crawl page
197196

198-
Crawl a page via [crawlPage()](#crawlPage)
197+
Crawl a page via [crawlPage()](#crawlPage) .
199198

200199
```js
201200
import xCrawl from 'x-crawl'
@@ -274,7 +273,7 @@ myXCrawl
274273

275274
### Crawl interface
276275

277-
Crawl interface data through [crawlData()](#crawlData)
276+
Crawl interface data through [crawlData()](#crawlData) .
278277

279278
```js
280279
import xCrawl from 'x-crawl'
@@ -297,7 +296,7 @@ myXCrawl.crawlData({ requestConfig }).then(res => {
297296

298297
### Crawl files
299298

300-
Crawl file data via [crawlFile()](#crawlFile)
299+
Crawl file data via [crawlFile()](#crawlFile) .
301300

302301
```js
303302
import xCrawl from 'x-crawl'
@@ -323,7 +322,7 @@ myXCrawl
323322

324323
### Start polling
325324

326-
Start a polling crawl with [startPolling](#startPolling)
325+
Start a polling crawl with [startPolling()](#startPolling) .
327326

328327
```js
329328
import xCrawl from 'x-crawl'
@@ -348,11 +347,11 @@ Callback function parameters:
348347
- The count attribute records the current number of polling operations.
349348
- stopPolling is a callback function, calling it can terminate subsequent polling operations.
350349

351-
### Request interval time
350+
### Crawl interval
352351

353352
Setting the requests interval time can prevent too much concurrency and avoid too much pressure on the server.
354353

355-
It can be set when creating a crawler instance, or you can choose to set it separately for an API. The request interval time is controlled internally by the instance method, not by the instance to control the entire request interval time.
354+
It can be set when creating a crawler instance, or you can choose to set it separately for an API. The crawl interval is controlled internally by the instance method, not by the instance to control the entire crawl interval.
356355

357356
```js
358357
import xCrawl from 'x-crawl'
@@ -510,7 +509,7 @@ import xCrawl from 'x-crawl'
510509
const myXCrawl = xCrawl({
511510
baseUrl: 'https://xxx.com',
512511
timeout: 10000,
513-
// The interval between requests, multiple requests are valid
512+
// Crawling interval time, batch crawling is only valid
514513
intervalTime: {
515514
max: 2000,
516515
min: 1000

publish/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "3.2.5",
3+
"version": "3.2.6",
44
"author": "coderHXL",
55
"description": "x-crawl is a flexible nodejs crawler library.",
66
"license": "MIT",

0 commit comments

Comments
 (0)