@@ -9,12 +9,12 @@ If you feel good, you can give [x-crawl repository](https://github.com/coder-hxl
9
9
## Features
10
10
11
11
- Support asynchronous/synchronous way to crawl data.
12
- - The writing method is very flexible and supports multiple ways to write request configuration and obtain crawling results.
12
+ - Flexible writing, support a variety of ways to write request configuration and obtain crawl results.
13
13
- Flexible crawling interval, up to you to use/avoid high concurrent crawling.
14
14
- With simple configuration, operations such as crawling pages, batch network requests, and batch download of file resources can be performed.
15
15
- Possess polling function to crawl data regularly.
16
16
- The built-in puppeteer crawls the page, and uses the jsdom library to analyze the content of the page, and also supports self-analysis.
17
- - Capture and record the success and failure of batch crawling, and highlight the reminders .
17
+ - Capture the success and failure of the climb and highlight the reminder .
18
18
- Written in TypeScript, has types, provides generics.
19
19
20
20
## Relationship with puppeteer
@@ -65,21 +65,20 @@ The return value of the crawlPage API will be able to do the following:
65
65
- [ Types] ( #Types )
66
66
* [ AnyObject] ( #AnyObject )
67
67
* [ Method] ( #Method )
68
- * [ RequestConfigObject] ( #RequestConfigObject )
68
+ * [ RequestConfigObjectV1] ( #RequestConfigObjectV1 )
69
+ * [ RequestConfigObjectV2] ( #RequestConfigObjectV2 )
69
70
* [ RequestConfig] ( #RequestConfig )
70
- * [ MergeRequestConfigObject] ( #MergeRequestConfigObject )
71
71
* [ IntervalTime] ( #IntervalTime )
72
72
* [ XCrawlBaseConfig] ( #XCrawlBaseConfig )
73
- * [ CrawlBaseConfigV1] ( #CrawlBaseConfigV1 )
74
73
* [ CrawlPageConfig] ( #CrawlPageConfig )
74
+ * [ CrawlBaseConfigV1] ( #CrawlBaseConfigV1 )
75
75
* [ CrawlDataConfig] ( #CrawlDataConfig )
76
76
* [ CrawlFileConfig] ( #CrawlFileConfig )
77
77
* [ StartPollingConfig] ( #StartPollingConfig )
78
- * [ XCrawlInstance] ( #XCrawlInstance )
79
78
* [ CrawlResCommonV1] ( #CrawlResCommonV1 )
80
79
* [ CrawlResCommonArrV1] ( #CrawlResCommonArrV1 )
80
+ * [ CrawlPage] ( #CrawlPage-2 )
81
81
* [ FileInfo] ( #FileInfo )
82
- * [ CrawlPage] ( #CrawlPage )
83
82
- [ More] ( #More )
84
83
85
84
## Install
@@ -682,10 +681,21 @@ interface AnyObject extends Object {
682
681
type Method = 'get' | 'GET' | 'delete' | 'DELETE' | 'head' | 'HEAD' | 'options' | 'OPTONS' | 'post' | 'POST' | 'put' | 'PUT' | 'patch' | 'PATCH' | 'purge' | 'PURGE' | 'link' | 'LINK' | 'unlink' | 'UNLINK'
683
682
` ` `
684
683
685
- ### RequestConfigObject
684
+ ### RequestConfigObjectV1
685
+
686
+ ` ` ` ts
687
+ interface RequestConfigObjectV1 {
688
+ url: string
689
+ headers?: AnyObject
690
+ timeout?: number
691
+ proxy?: string
692
+ }
693
+ ` ` `
694
+
695
+ ### RequestConfigObjectV2
686
696
687
697
` ` ` ts
688
- interface RequestConfigObject {
698
+ interface RequestConfigObjectV2 {
689
699
url: string
690
700
method?: Method
691
701
headers?: AnyObject
@@ -699,17 +709,7 @@ interface RequestConfigObject {
699
709
### RequestConfig
700
710
701
711
` ` ` ts
702
- type RequestConfig = string | RequestConfigObject
703
- ` ` `
704
-
705
- ### MergeRequestConfigObject
706
-
707
- ` ` ` ts
708
- interface MergeRequestConfigObject {
709
- url: string
710
- timeout?: number
711
- proxy?: string
712
- }
712
+ type RequestConfig = string | RequestConfigObjectV2
713
713
` ` `
714
714
715
715
### IntervalTime
@@ -733,6 +733,12 @@ interface XCrawlBaseConfig {
733
733
}
734
734
` ` `
735
735
736
+ ### CrawlPageConfig
737
+
738
+ ` ` ` ts
739
+ type CrawlPageConfig = string | RequestConfigObjectV1
740
+ ` ` `
741
+
736
742
### CrawlBaseConfigV1
737
743
738
744
` ` ` ts
@@ -742,12 +748,6 @@ interface CrawlBaseConfigV1 {
742
748
}
743
749
` ` `
744
750
745
- ### CrawlPageConfig
746
-
747
- ` ` ` ts
748
- type CrawlPageConfig = string | MergeRequestConfigObject
749
- ` ` `
750
-
751
751
### CrawlDataConfig
752
752
753
753
` ` ` ts
@@ -805,7 +805,7 @@ interface XCrawlInstance {
805
805
### CrawlResCommonV1
806
806
807
807
` ` ` ts
808
- interface CrawlCommon <T> {
808
+ interface CrawlResCommonV1 <T> {
809
809
id: number
810
810
statusCode: number | undefined
811
811
headers: IncomingHttpHeaders // nodejs: http type
@@ -819,6 +819,17 @@ interface CrawlCommon<T> {
819
819
type CrawlResCommonArrV1<T> = CrawlResCommonV1<T>[]
820
820
` ` `
821
821
822
+ ### CrawlPage
823
+
824
+ ` ` ` ts
825
+ interface CrawlPage {
826
+ httpResponse: HTTPResponse | null // The type of HTTPResponse in the puppeteer library
827
+ browser: Browser // The Browser type of the puppeteer library
828
+ page: Page // The Page type of the puppeteer library
829
+ jsdom: JSDOM // jsdom type of the JSDOM library
830
+ }
831
+ ` ` `
832
+
822
833
### FileInfo
823
834
824
835
` ` ` ts
@@ -830,17 +841,6 @@ interface FileInfo {
830
841
}
831
842
` ` `
832
843
833
- ### CrawlPage
834
-
835
- ` ` ` ts
836
- interface CrawlPage {
837
- httpResponse: HTTPResponse | null // The type of HTTPResponse in the puppeteer library
838
- browser // The type of Browser in the puppeteer library
839
- page: Page // The type of Page in the puppeteer library
840
- jsdom: JSDOM // The type of JSDOM in the jsdom library
841
- }
842
- ` ` `
843
-
844
844
## More
845
845
846
846
If you have any ** questions ** or ** needs ** , please submit ** Issues in ** https :// github.com/coder-hxl/x-crawl/issues .
0 commit comments