Skip to content

Commit 2316bb4

Browse files
committed
Docs: Update
1 parent ea453fa commit 2316bb4

File tree

6 files changed

+64
-41
lines changed

6 files changed

+64
-41
lines changed

README.md

+21-12
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,21 @@ English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.
44

55
x-crawl is a Nodejs multifunctional crawler library.
66

7-
## Feature
7+
## Features
88

9-
- Crawl HTML, JSON, file resources, etc. with simple configuration.
10-
- Built-in puppeteer crawls HTML and uses JSDOM library to parse HTML.
9+
- Crawl pages, JSON, file resources, etc. with simple configuration.
10+
- The built-in puppeteer crawls the page, and uses the jsdom library to parse the page.
1111
- Support asynchronous/synchronous way to crawl data.
12-
- Support Promise/Callback way to get the result.
13-
- Polling function.
12+
- Support Promise/Callback method to get the result.
13+
- Polling function, fixed-point crawling.
1414
- Anthropomorphic request interval.
15-
- Written in TypeScript, provides generics.
15+
- Written in TypeScript, providing generics.
1616

17-
## Benefits provided by using puppeter
17+
## Relationship with puppeter
18+
19+
The fetchHTML API internally uses the [puppeter](https://github.com/puppeteer/puppeteer) library to crawl pages.
20+
21+
The following can be done:
1822

1923
- Generate screenshots and PDFs of pages.
2024
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
@@ -33,6 +37,7 @@ x-crawl is a Nodejs multifunctional crawler library.
3337
* [fetchHTML](#fetchHTML)
3438
+ [Type](#Type-2)
3539
+ [Example](#Example-2)
40+
+ [About page](#About-page)
3641
* [fetchData](#fetchData)
3742
+ [Type](#Type-3)
3843
+ [Example](#Example-3)
@@ -173,12 +178,12 @@ The first request is not to trigger the interval.
173178

174179
### fetchHTML
175180

176-
fetchHTML is the method of the above [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl HTML.
181+
fetchHTML is the method of the above [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page.
177182

178183
#### Type
179184

180185
- Look at the [FetchHTMLConfig](#FetchHTMLConfig) type
181-
- Look at the [FetchHTML](#FetchHTML) type
186+
- Look at the [FetchHTML](#FetchHTML-2) type
182187

183188
```ts
184189
function fetchHTML: (
@@ -196,6 +201,10 @@ myXCrawl.fetchHTML('/xxx').then((res) => {
196201
})
197202
```
198203

204+
#### About page
205+
206+
Get the page instance from res.data.page, which can do interactive operations such as events. For specific usage, refer to [page](https://pptr.dev/api/puppeteer.page).
207+
199208
### fetchData
200209

201210
fetchData is the method of the above [myXCrawl](#Example-1) instance, which is usually used to crawl APIs to obtain JSON data and so on.
@@ -224,7 +233,7 @@ const requestConfig = [
224233
225234
myXCrawl.fetchData({
226235
requestConfig, // Request configuration, can be RequestConfig | RequestConfig[]
227-
intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when not using myXCrawl
236+
intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when creating myXCrawl is not used
228237
}).then(res => {
229238
console.log(res)
230239
})
@@ -380,7 +389,7 @@ interface FetchDataConfig extends FetchBaseConfigV1 {
380389
interface FetchFileConfig extends FetchBaseConfigV1 {
381390
fileConfig: {
382391
storeDir: string // Store folder
383-
extension?: string // filename extension
392+
extension?: string // Filename extension
384393
}
385394
}
386395
```
@@ -409,7 +418,7 @@ interface FetchCommon<T> {
409418
### FetchResCommonArrV1
410419

411420
```ts
412-
type FetchCommonArr<T> = FetchCommon<T>[]
421+
type FetchResCommonArrV1<T> = FetchResCommonV1<T>[]
413422
```
414423

415424
### FileInfo

docs/cn.md

+19-10
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,19 @@ x-crawl 是 Nodejs 多功能爬虫库。
66

77
## 特征
88

9-
- 只需简单的配置即可抓取 HTML 、JSON、文件资源等等。
10-
- 内置 puppeteer 爬取 HTML ,并用 JSDOM 库对 HTML 解析
9+
- 只需简单的配置即可抓取页面、JSON、文件资源等等。
10+
- 内置 puppeteer 爬取页面 ,并用采用 jsdom 库对页面解析
1111
- 支持 异步/同步 方式爬取数据。
1212
- 支持 Promise/Callback 方式获取结果。
13-
- 轮询功能。
13+
- 轮询功能,定点爬取
1414
- 拟人化的请求间隔时间。
1515
- 使用 TypeScript 编写,提供泛型。
1616

17-
## 使用 puppeter 提供的好处
17+
## 跟 puppeter 的关系
18+
19+
fetchHTML API 内部使用 [puppeter](https://github.com/puppeteer/puppeteer) 库来爬取页面。
20+
21+
可以完成以下操作:
1822

1923
- 生成页面的屏幕截图和 PDF。
2024
- 抓取 SPA(单页应用程序)并生成预渲染内容(即“SSR”(服务器端渲染))。
@@ -36,6 +40,7 @@ x-crawl 是 Nodejs 多功能爬虫库。
3640
* [fetchData](#fetchData)
3741
+ [类型](#类型-3)
3842
+ [示例](#示例-3)
43+
+ [关于 page](#关于-page)
3944
* [fetchFile](#fetchFile)
4045
+ [类型](#类型-4)
4146
+ [示例](#示例-4)
@@ -166,12 +171,12 @@ intervalTime 选项默认为 undefined 。若有设置值,则会在请求前
166171
167172
### fetchHTML
168173
169-
fetchHTML 是 [myXCrawl](https://github.com/coder-hxl/x-crawl/blob/main/document/cn.md#%E7%A4%BA%E4%BE%8B-1) 实例的方法,通常用于爬取 HTML
174+
fetchHTML 是 [myXCrawl](https://github.com/coder-hxl/x-crawl/blob/main/document/cn.md#%E7%A4%BA%E4%BE%8B-1) 实例的方法,通常用于爬取页面
170175
171176
#### 类型
172177
173178
- 查看 [FetchHTMLConfig](#FetchHTMLConfig) 类型
174-
- 查看 [FetchHTML](#FetchHTML) 类型
179+
- 查看 [FetchHTML](#FetchHTML-2) 类型
175180
176181
```ts
177182
function fetchHTML: (
@@ -189,6 +194,10 @@ myXCrawl.fetchHTML('/xxx').then((res) => {
189194
})
190195
```
191196
197+
#### 关于 page
198+
199+
从 res.data.page 拿到 page 实例,其可以做事件之类的交互操作,具体使用参考 [page](https://pptr.dev/api/puppeteer.page) 。
200+
192201
### fetchData
193202
194203
fetch 是 [myXCrawl](#示例-1) 实例的方法,通常用于爬取 API ,可获取 JSON 数据等等。
@@ -217,7 +226,7 @@ const requestConfig = [
217226
218227
myXCrawl.fetchData({
219228
requestConfig, // 请求配置, 可以是 RequestConfig | RequestConfig[]
220-
intervalTime: { max: 5000, min: 1000 } // 不使用 myXCrawl 时传入的 intervalTime
229+
intervalTime: { max: 5000, min: 1000 } // 不使用创建 myXCrawl 时传入的 intervalTime
221230
}).then(res => {
222231
console.log(res)
223232
})
@@ -391,7 +400,7 @@ interface StartPollingConfig {
391400
### FetchResCommonV1
392401
393402
```ts
394-
interface FetchCommon<T> {
403+
interface FetchResCommonV1<T> {
395404
id: number
396405
statusCode: number | undefined
397406
headers: IncomingHttpHeaders // nodejs: http 类型
@@ -402,7 +411,7 @@ interface FetchCommon<T> {
402411
### FetchResCommonArrV1
403412
404413
```ts
405-
type FetchCommonArr<T> = FetchCommon<T>[]
414+
type FetchResCommonArrV1<T> = FetchResCommonV1<T>[]
406415
```
407416
408417
### FileInfo
@@ -423,7 +432,7 @@ interface FetchHTML {
423432
httpResponse: HTTPResponse | null // puppeteer 库的 HTTPResponse 类型
424433
data: {
425434
page: Page // puppeteer 库的 Page 类型
426-
jsdom: JSDOM
435+
jsdom: JSDOM // jsdom 库的 JSDOM 类型
427436
}
428437
}
429438
```

package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "2.2.0",
4+
"version": "2.2.1",
55
"author": "coderHXL",
66
"description": "XCrawl is a Nodejs multifunctional crawler library.",
77
"license": "MIT",

publish/README.md

+21-12
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,21 @@ English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.
44

55
x-crawl is a Nodejs multifunctional crawler library.
66

7-
## Feature
7+
## Features
88

9-
- Crawl HTML, JSON, file resources, etc. with simple configuration.
10-
- Built-in puppeteer crawls HTML and uses JSDOM library to parse HTML.
9+
- Crawl pages, JSON, file resources, etc. with simple configuration.
10+
- The built-in puppeteer crawls the page, and uses the jsdom library to parse the page.
1111
- Support asynchronous/synchronous way to crawl data.
12-
- Support Promise/Callback way to get the result.
13-
- Polling function.
12+
- Support Promise/Callback method to get the result.
13+
- Polling function, fixed-point crawling.
1414
- Anthropomorphic request interval.
15-
- Written in TypeScript, provides generics.
15+
- Written in TypeScript, providing generics.
1616

17-
## Benefits provided by using puppeter
17+
## Relationship with puppeter
18+
19+
The fetchHTML API internally uses the [puppeter](https://github.com/puppeteer/puppeteer) library to crawl pages.
20+
21+
The following can be done:
1822

1923
- Generate screenshots and PDFs of pages.
2024
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
@@ -33,6 +37,7 @@ x-crawl is a Nodejs multifunctional crawler library.
3337
* [fetchHTML](#fetchHTML)
3438
+ [Type](#Type-2)
3539
+ [Example](#Example-2)
40+
+ [About page](#About-page)
3641
* [fetchData](#fetchData)
3742
+ [Type](#Type-3)
3843
+ [Example](#Example-3)
@@ -173,12 +178,12 @@ The first request is not to trigger the interval.
173178

174179
### fetchHTML
175180

176-
fetchHTML is the method of the above [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl HTML.
181+
fetchHTML is the method of the above [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page.
177182

178183
#### Type
179184

180185
- Look at the [FetchHTMLConfig](#FetchHTMLConfig) type
181-
- Look at the [FetchHTML](#FetchHTML) type
186+
- Look at the [FetchHTML](#FetchHTML-2) type
182187

183188
```ts
184189
function fetchHTML: (
@@ -196,6 +201,10 @@ myXCrawl.fetchHTML('/xxx').then((res) => {
196201
})
197202
```
198203

204+
#### About page
205+
206+
Get the page instance from res.data.page, which can do interactive operations such as events. For specific usage, refer to [page](https://pptr.dev/api/puppeteer.page).
207+
199208
### fetchData
200209

201210
fetchData is the method of the above [myXCrawl](#Example-1) instance, which is usually used to crawl APIs to obtain JSON data and so on.
@@ -224,7 +233,7 @@ const requestConfig = [
224233
225234
myXCrawl.fetchData({
226235
requestConfig, // Request configuration, can be RequestConfig | RequestConfig[]
227-
intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when not using myXCrawl
236+
intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when creating myXCrawl is not used
228237
}).then(res => {
229238
console.log(res)
230239
})
@@ -380,7 +389,7 @@ interface FetchDataConfig extends FetchBaseConfigV1 {
380389
interface FetchFileConfig extends FetchBaseConfigV1 {
381390
fileConfig: {
382391
storeDir: string // Store folder
383-
extension?: string // filename extension
392+
extension?: string // Filename extension
384393
}
385394
}
386395
```
@@ -409,7 +418,7 @@ interface FetchCommon<T> {
409418
### FetchResCommonArrV1
410419

411420
```ts
412-
type FetchCommonArr<T> = FetchCommon<T>[]
421+
type FetchResCommonArrV1<T> = FetchResCommonV1<T>[]
413422
```
414423

415424
### FileInfo

publish/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "2.2.0",
3+
"version": "2.2.1",
44
"author": "coderHXL",
55
"description": "XCrawl is a Nodejs multifunctional crawler library.",
66
"license": "MIT",

src/types/index.ts

+1-5
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ import {
99
StartPollingConfig,
1010
IntervalTime
1111
} from './api'
12-
import { MapTypeObject } from './common'
1312

1413
export interface XCrawlBaseConfig {
1514
baseUrl?: string
@@ -19,13 +18,10 @@ export interface XCrawlBaseConfig {
1918
proxy?: string
2019
}
2120

22-
interface LoaderXCrawlBaseConfigValue {
21+
export type LoaderXCrawlBaseConfig = XCrawlBaseConfig & {
2322
mode: 'async' | 'sync'
2423
}
2524

26-
export type LoaderXCrawlBaseConfig = XCrawlBaseConfig &
27-
MapTypeObject<LoaderXCrawlBaseConfigValue>
28-
2925
export interface XCrawlInstance {
3026
fetchHTML: (
3127
config: FetchHTMLConfig,

0 commit comments

Comments
 (0)