Skip to content

Commit 7bea771

Browse files
committed
Update: Docs
1 parent a7d0a7c commit 7bea771

File tree

5 files changed

+31
-38
lines changed

5 files changed

+31
-38
lines changed

README.md

+9-8
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,20 @@
22

33
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
44

5-
x-crawl is a flexible nodejs crawler library. It is used to batch crawl data, network requests and download file resources. Support crawling data asynchronously or synchronously. Since it runs on nodejs, it is friendly to JS/TS developers.
5+
X-Crawl is a flexible Nodejs reptile bank. Used to crawl pages, batch network requests, and download file resources in batches. There are 5 kinds of RequestConfig writing, 3 ways to obtain results, and crawl data asynchronous or synchronized mode. Run on Nodejs and be friendly to JS/TS developers.
66

77
If you feel good, you can support [x-crawl repository](https://github.com/coder-hxl/x-crawl) with a Star.
88

99
## Features
1010

11-
- Support asynchronous/synchronous way to crawl data.
12-
- Support Promise/Callback method to get the result.
13-
- Anthropomorphic request interval.
14-
- Crawl pages, JSON, file resources, etc. with simple configuration.
15-
- Polling function, timing crawling.
16-
- The built-in puppeteer crawls the page and uses the jsdom library to parse the page.
17-
- Written in TypeScript, has type hints, and provides generics.
11+
- Cules data for asynchronous/synchronous ways.
12+
- In three ways to obtain the results of the three ways of supporting Promise, Callback, and Promise + Callback.
13+
- RquestConfig has 5 ways of writing.
14+
- The anthropomorphic request interval time.
15+
- In a simple configuration, you can capture pages, JSON, file resources, and so on.
16+
- The rotation function, crawl regularly.
17+
- The built -in Puppeteer crawl the page and uses the JSDOM library to analyze the page, or it can also be parsed by itself.
18+
- Chopening with TypeScript, possessing type prompts, and providing generic types.
1819

1920
## Relationship with puppeteer
2021

docs/cn.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,19 @@
22

33
[English](https://github.com/coder-hxl/x-crawl#x-crawl) | 简体中文
44

5-
x-crawl 是一个灵活的 nodejs 爬虫库。用来批量爬取数据、网络请求以及下载文件资源。支持采用异步或同步的方式爬取数据。因跑在 nodejs 上,所以对 JS/TS 开发者友好。
5+
x-crawl 是一个灵活的 nodejs 爬虫库。用来爬取页面、批量网络请求以及批量下载文件资源。有 5 种 requestConfig 的写法,3 种获取结果的写法,异步或同步模式爬取数据。跑在 nodejs 上, JS/TS 开发者友好。
66

77
如果感觉不错,可以给 [x-crawl 存储库](https://github.com/coder-hxl/x-crawl) 点个 Star 支持一下。
88

99
## 特征
1010

1111
- 支持 异步/同步 方式爬取数据。
12-
- 支持 Promise/Callback 方式获取结果。
12+
- 支持 Promise、Callback 以及 Promise + Callback 这 3 种方式获取结果。
13+
- requestConfig 拥有 5 种写法。
1314
- 拟人化的请求间隔时间。
1415
- 只需简单的配置即可抓取页面、JSON、文件资源等等。
1516
- 轮询功能,定时爬取。
16-
- 内置 puppeteer 爬取页面 ,并用采用 jsdom 库对页面解析。
17+
- 内置 puppeteer 爬取页面 ,并用采用 jsdom 库对页面解析,也可自行解析
1718
- 使用 TypeScript 编写,拥有类型提示,提供泛型。
1819

1920
## 跟 puppeteer 的关系

package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "3.2.2",
4+
"version": "3.2.3",
55
"author": "coderHXL",
66
"description": "x-crawl is a flexible nodejs crawler library. ",
77
"license": "MIT",

publish/README.md

+16-25
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,20 @@
22

33
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
44

5-
x-crawl is a flexible nodejs crawler library. It is used to batch crawl data, network requests and download file resources. Support crawling data asynchronously or synchronously. Since it runs on nodejs, it is friendly to JS/TS developers.
5+
X-Crawl is a flexible Nodejs reptile bank. Used to crawl pages, batch network requests, and download file resources in batches. There are 5 kinds of RequestConfig writing, 3 ways to obtain results, and crawl data asynchronous or synchronized mode. Run on Nodejs and be friendly to JS/TS developers.
66

77
If you feel good, you can support [x-crawl repository](https://github.com/coder-hxl/x-crawl) with a Star.
88

99
## Features
1010

11-
- Support asynchronous/synchronous way to crawl data.
12-
- Support Promise/Callback method to get the result.
13-
- Anthropomorphic request interval.
14-
- Crawl pages, JSON, file resources, etc. with simple configuration.
15-
- Polling function, timing crawling.
16-
- The built-in puppeteer crawls the page and uses the jsdom library to parse the page.
17-
- Written in TypeScript, has type hints, and provides generics.
11+
- Cules data for asynchronous/synchronous ways.
12+
- In three ways to obtain the results of the three ways of supporting Promise, Callback, and Promise + Callback.
13+
- RquestConfig has 5 ways of writing.
14+
- The anthropomorphic request interval time.
15+
- In a simple configuration, you can capture pages, JSON, file resources, and so on.
16+
- The rotation function, crawl regularly.
17+
- The built -in Puppeteer crawl the page and uses the JSDOM library to analyze the page, or it can also be parsed by itself.
18+
- Chopening with TypeScript, possessing type prompts, and providing generic types.
1819

1920
## Relationship with puppeteer
2021

@@ -95,7 +96,6 @@ Regular crawling: Get the recommended pictures of the youtube homepage every oth
9596

9697
```js
9798
// 1.Import module ES/CJS
98-
import path from 'node:path'
9999
import xCrawl from 'x-crawl'
100100

101101
// 2.Create a crawler instance
@@ -125,13 +125,7 @@ myXCrawl.startPolling({ d: 1 }, () => {
125125
})
126126

127127
// Call the crawlFile API to crawl pictures
128-
myXCrawl.crawlFile({
129-
requestConfig,
130-
fileConfig: { storeDir: path.resolve(__dirname, './upload') }
131-
})
132-
133-
// Close the browser
134-
browser.close()
128+
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
135129
})
136130
})
137131
```
@@ -260,7 +254,6 @@ myXCrawl.crawlData({ requestConfig }).then(res => {
260254
Crawl file data via [crawlFile()](#crawlFile)
261255

262256
```js
263-
import path from 'node:path'
264257
import xCrawl from 'x-crawl'
265258

266259
const myXCrawl = xCrawl({
@@ -274,7 +267,7 @@ myXCrawl
274267
.crawlFile({
275268
requestConfig,
276269
fileConfig: {
277-
storeDir: path.resolve(__dirname, './upload') // storage folder
270+
storeDir: './upload' // storage folder
278271
}
279272
})
280273
.then((fileInfos) => {
@@ -299,9 +292,7 @@ myXCrawl. startPolling({ h: 2, m: 30 }, (count, stopPolling) => {
299292
// crawlPage/crawlData/crawlFile
300293
myXCrawl.crawlPage('https://xxx.com').then(res => {
301294
const { jsdom, browser, page } = res
302-
303-
// Close the browser
304-
browser.close()
295+
305296
})
306297
})
307298
```
@@ -414,7 +405,7 @@ const requestConfig = [ 'https://xxx.com/xxxx', 'https://xxx.com/xxxx', 'https:/
414405
myXCrawl
415406
.crawlFile({
416407
requestConfig,
417-
fileConfig: { storeDir: path. resolve(__dirname, './upload') }
408+
fileConfig: { storeDir: './upload' }
418409
})
419410
.then((fileInfos) => {
420411
console.log('Promise: ', fileInfos)
@@ -424,7 +415,7 @@ myXCrawl
424415
myXCrawl.crawlFile(
425416
{
426417
requestConfig,
427-
fileConfig: { storeDir: path. resolve(__dirname, './upload') }
418+
fileConfig: { storeDir: './upload' }
428419
},
429420
(fileInfo) => {
430421
console.log('Callback: ', fileInfo)
@@ -436,7 +427,7 @@ myXCrawl
436427
.crawlFile(
437428
{
438429
requestConfig,
439-
fileConfig: { storeDir: path. resolve(__dirname, './upload') }
430+
fileConfig: { storeDir: './upload' }
440431
},
441432
(fileInfo) => {
442433
console.log('Callback: ', fileInfo)
@@ -589,7 +580,7 @@ myXCrawl
589580
.crawlFile({
590581
requestConfig,
591582
fileConfig: {
592-
storeDir: path.resolve(__dirname, './upload') // storage folder
583+
storeDir: './upload' // storage folder
593584
}
594585
})
595586
.then((fileInfos) => {

publish/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "3.2.2",
3+
"version": "3.2.3",
44
"author": "coderHXL",
55
"description": "x-crawl is a flexible nodejs crawler library.",
66
"license": "MIT",

0 commit comments

Comments
 (0)