Skip to content

Commit b9e0b87

Browse files
committed
Update: Internal type adjustments to catch errors in crawlPage API
1 parent 7ce4d12 commit b9e0b87

File tree

14 files changed

+226
-223
lines changed

14 files changed

+226
-223
lines changed

.vscode/settings.json

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"typescript.tsdk": "node_modules\\typescript\\lib"
3+
}

README.md

+38-38
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,12 @@ If you feel good, you can give [x-crawl repository](https://github.com/coder-hxl
99
## Features
1010

1111
- Support asynchronous/synchronous way to crawl data.
12-
- The writing method is very flexible and supports multiple ways to write request configuration and obtain crawling results.
12+
- Flexible writing, support a variety of ways to write request configuration and obtain crawl results.
1313
- Flexible crawling interval, up to you to use/avoid high concurrent crawling.
1414
- With simple configuration, operations such as crawling pages, batch network requests, and batch download of file resources can be performed.
1515
- Possess polling function to crawl data regularly.
1616
- The built-in puppeteer crawls the page, and uses the jsdom library to analyze the content of the page, and also supports self-analysis.
17-
- Capture and record the success and failure of batch crawling, and highlight the reminders.
17+
- Capture the success and failure of the climb and highlight the reminder.
1818
- Written in TypeScript, has types, provides generics.
1919

2020
## Relationship with puppeteer
@@ -65,21 +65,20 @@ The return value of the crawlPage API will be able to do the following:
6565
- [Types](#Types)
6666
* [AnyObject](#AnyObject)
6767
* [Method](#Method)
68-
* [RequestConfigObject](#RequestConfigObject)
68+
* [RequestConfigObjectV1](#RequestConfigObjectV1)
69+
* [RequestConfigObjectV2](#RequestConfigObjectV2)
6970
* [RequestConfig](#RequestConfig)
70-
* [MergeRequestConfigObject](#MergeRequestConfigObject)
7171
* [IntervalTime](#IntervalTime)
7272
* [XCrawlBaseConfig](#XCrawlBaseConfig)
73-
* [CrawlBaseConfigV1](#CrawlBaseConfigV1)
7473
* [CrawlPageConfig](#CrawlPageConfig )
74+
* [CrawlBaseConfigV1](#CrawlBaseConfigV1)
7575
* [CrawlDataConfig](#CrawlDataConfig)
7676
* [CrawlFileConfig](#CrawlFileConfig)
7777
* [StartPollingConfig](#StartPollingConfig)
78-
* [XCrawlInstance](#XCrawlInstance)
7978
* [CrawlResCommonV1](#CrawlResCommonV1)
8079
* [CrawlResCommonArrV1](#CrawlResCommonArrV1)
80+
* [CrawlPage](#CrawlPage-2)
8181
* [FileInfo](#FileInfo)
82-
* [CrawlPage](#CrawlPage)
8382
- [More](#More)
8483

8584
## Install
@@ -682,10 +681,21 @@ interface AnyObject extends Object {
682681
type Method = 'get' | 'GET' | 'delete' | 'DELETE' | 'head' | 'HEAD' | 'options' | 'OPTONS' | 'post' | 'POST' | 'put' | 'PUT' | 'patch' | 'PATCH' | 'purge' | 'PURGE' | 'link' | 'LINK' | 'unlink' | 'UNLINK'
683682
```
684683

685-
### RequestConfigObject
684+
### RequestConfigObjectV1
685+
686+
```ts
687+
interface RequestConfigObjectV1 {
688+
url: string
689+
headers?: AnyObject
690+
timeout?: number
691+
proxy?: string
692+
}
693+
```
694+
695+
### RequestConfigObjectV2
686696

687697
```ts
688-
interface RequestConfigObject {
698+
interface RequestConfigObjectV2 {
689699
url: string
690700
method?: Method
691701
headers?: AnyObject
@@ -699,17 +709,7 @@ interface RequestConfigObject {
699709
### RequestConfig
700710

701711
```ts
702-
type RequestConfig = string | RequestConfigObject
703-
```
704-
705-
### MergeRequestConfigObject
706-
707-
```ts
708-
interface MergeRequestConfigObject {
709-
url: string
710-
timeout?: number
711-
proxy?: string
712-
}
712+
type RequestConfig = string | RequestConfigObjectV2
713713
```
714714

715715
### IntervalTime
@@ -733,6 +733,12 @@ interface XCrawlBaseConfig {
733733
}
734734
```
735735

736+
### CrawlPageConfig
737+
738+
```ts
739+
type CrawlPageConfig = string | RequestConfigObjectV1
740+
```
741+
736742
### CrawlBaseConfigV1
737743

738744
```ts
@@ -742,12 +748,6 @@ interface CrawlBaseConfigV1 {
742748
}
743749
```
744750

745-
### CrawlPageConfig
746-
747-
```ts
748-
type CrawlPageConfig = string | MergeRequestConfigObject
749-
```
750-
751751
### CrawlDataConfig
752752

753753
```ts
@@ -805,7 +805,7 @@ interface XCrawlInstance {
805805
### CrawlResCommonV1
806806

807807
```ts
808-
interface CrawlCommon<T> {
808+
interface CrawlResCommonV1<T> {
809809
id: number
810810
statusCode: number | undefined
811811
headers: IncomingHttpHeaders // nodejs: http type
@@ -819,6 +819,17 @@ interface CrawlCommon<T> {
819819
type CrawlResCommonArrV1<T> = CrawlResCommonV1<T>[]
820820
```
821821

822+
### CrawlPage
823+
824+
```ts
825+
interface CrawlPage {
826+
httpResponse: HTTPResponse | null // The type of HTTPResponse in the puppeteer library
827+
browser: Browser // The Browser type of the puppeteer library
828+
page: Page // The Page type of the puppeteer library
829+
jsdom: JSDOM // jsdom type of the JSDOM library
830+
}
831+
```
832+
822833
### FileInfo
823834

824835
```ts
@@ -830,17 +841,6 @@ interface FileInfo {
830841
}
831842
```
832843

833-
### CrawlPage
834-
835-
```ts
836-
interface CrawlPage {
837-
httpResponse: HTTPResponse | null // The type of HTTPResponse in the puppeteer library
838-
browser // The type of Browser in the puppeteer library
839-
page: Page // The type of Page in the puppeteer library
840-
jsdom: JSDOM // The type of JSDOM in the jsdom library
841-
}
842-
```
843-
844844
## More
845845

846846
If you have any **questions** or **needs** , please submit **Issues in** https://github.com/coder-hxl/x-crawl/issues .

docs/cn.md

+37-36
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,12 @@ x-crawl 是一个灵活的 nodejs 爬虫库。可以爬取页面并控制页面
99
## 特征
1010

1111
- 支持 异步/同步 方式爬取数据。
12-
- 写法非常灵活,支持多种方式写请求配置和获取爬取结果。
12+
- 灵活的写法,支持多种方式写请求配置和获取爬取结果。
1313
- 灵活的爬取间隔时间,由你决定 使用/避免 高并发爬取。
1414
- 简单的配置即可抓取页面、批量网络请求以及批量下载文件资源等操作。
1515
- 拥有轮询功能,定时爬取数据。
1616
- 内置 puppeteer 爬取页面,并用采用 jsdom 库对页面内容解析,也支持自行解析。
17-
- 对批量爬取的成功和失败进行捕获记录,并进行高亮的提醒。
17+
- 对爬取的成功和失败进行捕获记录,并进行高亮的提醒。
1818
- 使用 TypeScript 编写,拥有类型,提供泛型。
1919

2020
## 跟 puppeteer 的关系
@@ -65,20 +65,20 @@ crawlPage API 的返回值将可以做以下操作:
6565
- [类型](#类型-6)
6666
* [AnyObject](#AnyObject)
6767
* [Method](#Method)
68-
* [RequestConfigObject](#RequestConfigObject)
68+
* [RequestConfigObjectV1](#RequestConfigObjectV1)
69+
* [RequestConfigObjectV2](#RequestConfigObjectV2)
6970
* [RequestConfig](#RequestConfig)
70-
* [MergeRequestConfigObject](#MergeRequestConfigObject)
7171
* [IntervalTime](#IntervalTime)
7272
* [XCrawlBaseConfig](#XCrawlBaseConfig)
73-
* [CrawlBaseConfigV1](#CrawlBaseConfigV1)
7473
* [CrawlPageConfig](#CrawlPageConfig )
74+
* [CrawlBaseConfigV1](#CrawlBaseConfigV1)
7575
* [CrawlDataConfig](#CrawlDataConfig)
7676
* [CrawlFileConfig](#CrawlFileConfig)
7777
* [StartPollingConfig](#StartPollingConfig)
7878
* [CrawlResCommonV1](#CrawlResCommonV1)
7979
* [CrawlResCommonArrV1](#CrawlResCommonArrV1)
80+
* [CrawlPage](#CrawlPage-2)
8081
* [FileInfo](#FileInfo)
81-
* [CrawlPage](#CrawlPage)
8282
- [更多](#更多)
8383

8484
## 安装
@@ -673,10 +673,21 @@ interface AnyObject extends Object {
673673
type Method = 'get' | 'GET' | 'delete' | 'DELETE' | 'head' | 'HEAD' | 'options' | 'OPTONS' | 'post' | 'POST' | 'put' | 'PUT' | 'patch' | 'PATCH' | 'purge' | 'PURGE' | 'link' | 'LINK' | 'unlink' | 'UNLINK'
674674
```
675675
676-
### RequestConfigObject
676+
### RequestConfigObjectV1
677677
678678
```ts
679-
interface RequestConfigObject {
679+
interface RequestConfigObjectV1 {
680+
url: string
681+
headers?: AnyObject
682+
timeout?: number
683+
proxy?: string
684+
}
685+
```
686+
687+
### RequestConfigObjectV2
688+
689+
```ts
690+
interface RequestConfigObjectV2 {
680691
url: string
681692
method?: Method
682693
headers?: AnyObject
@@ -690,17 +701,7 @@ interface RequestConfigObject {
690701
### RequestConfig
691702
692703
```ts
693-
type RequestConfig = string | RequestConfigObject
694-
```
695-
696-
### MergeRequestConfigObject
697-
698-
```ts
699-
interface MergeRequestConfigObject {
700-
url: string
701-
timeout?: number
702-
proxy?: string
703-
}
704+
type RequestConfig = string | RequestConfigObjectV2
704705
```
705706
706707
### IntervalTime
@@ -724,6 +725,12 @@ interface XCrawlBaseConfig {
724725
}
725726
```
726727
728+
### CrawlPageConfig
729+
730+
```ts
731+
type CrawlPageConfig = string | RequestConfigObjectV1
732+
```
733+
727734
### CrawlBaseConfigV1
728735
729736
```ts
@@ -733,12 +740,6 @@ interface CrawlBaseConfigV1 {
733740
}
734741
```
735742
736-
### CrawlPageConfig
737-
738-
```ts
739-
type CrawlPageConfig = string | MergeRequestConfigObject
740-
```
741-
742743
### CrawlDataConfig
743744
744745
```ts
@@ -810,17 +811,6 @@ interface CrawlResCommonV1<T> {
810811
type CrawlResCommonArrV1<T> = CrawlResCommonV1<T>[]
811812
```
812813
813-
### FileInfo
814-
815-
```ts
816-
interface FileInfo {
817-
fileName: string
818-
mimeType: string
819-
size: number
820-
filePath: string
821-
}
822-
```
823-
824814
### CrawlPage
825815
826816
```ts
@@ -832,6 +822,17 @@ interface CrawlPage {
832822
}
833823
```
834824
825+
### FileInfo
826+
827+
```ts
828+
interface FileInfo {
829+
fileName: string
830+
mimeType: string
831+
size: number
832+
filePath: string
833+
}
834+
```
835+
835836
## 更多
836837
837838
如有 **问题** 或 **需求** 请在 https://github.com/coder-hxl/x-crawl/issues 中提 **Issues** 。

package.json

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "3.2.7",
4+
"version": "3.2.8",
55
"author": "coderHXL",
66
"description": "x-crawl is a flexible nodejs crawler library.",
77
"license": "MIT",
@@ -39,6 +39,6 @@
3939
"rollup": "^3.10.1",
4040
"rollup-plugin-typescript2": "^0.34.1",
4141
"ts-jest": "^29.0.5",
42-
"typescript": "^4.9.4"
42+
"typescript": "5.0.2"
4343
}
4444
}

0 commit comments

Comments
 (0)