Skip to content

Commit 6e2947d

Browse files
committed
fetchHTML API parameter can be Object type
1 parent 53bf5d5 commit 6e2947d

File tree

8 files changed

+36
-17
lines changed

8 files changed

+36
-17
lines changed

README.md

+18-6
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
English | <a href="#cn" style="text-decoration: none">简体中文</a>
44

5-
XCrawl is a Nodejs multifunctional crawler library. Provide configuration to batch fetch HTML, JSON, images, etc.
5+
XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.
66

77
## Install
88

@@ -47,7 +47,7 @@ class XCrawl {
4747
constructor(baseConfig?: IXCrawlBaseConifg)
4848
fetchData<T = any>(config: IFetchDataConfig): Promise<IFetchCommon<T>>
4949
fetchFile(config: IFetchFileConfig): Promise<IFetchCommon<IFileInfo>>
50-
fetchHTML(url: string): Promise<JSDOM>
50+
fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM>
5151
}
5252
```
5353
@@ -130,7 +130,7 @@ fetchHTML is the method of the above <a href="#myXCrawl" style="text-decoration
130130
- Type
131131
132132
```ts
133-
function fetchHTML(url: string): Promise<JSDOM>
133+
function fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM>
134134
```
135135

136136
- Example
@@ -237,6 +237,12 @@ interface IFetchFileConfig extends IFetchBaseConifg {
237237
}
238238
```
239239
240+
- IFetchHTMLConfig
241+
242+
```ts
243+
interface IFetchHTMLConfig extends IRequestConfig {}
244+
```
245+
240246
## More
241247
242248
If you have any **questions** or **needs** , please submit **Issues in** https://github.com/coder-hxl/x-crawl/issues .
@@ -249,7 +255,7 @@ If you have any **questions** or **needs** , please submit **Issues in** https:/
249255
250256
<a href="#en" style="text-decoration: none">English</a> | 简体中文
251257
252-
XCrawl 是 Nodejs 多功能爬虫库。提供配置即可批量抓取 HTML 、JSON、图片等等
258+
XCrawl 是 Nodejs 多功能爬虫库。只需简单的配置即可抓取 HTML 、JSON、文件资源等等
253259
254260
## 安装
255261
@@ -294,7 +300,7 @@ class XCrawl {
294300
constructor(baseConfig?: IXCrawlBaseConifg)
295301
fetchData<T = any>(config: IFetchDataConfig): Promise<IFetchCommon<T>>
296302
fetchFile(config: IFetchFileConfig): Promise<IFetchCommon<IFileInfo>>
297-
fetchHTML(url: string): Promise<JSDOM>
303+
fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM>
298304
}
299305
```
300306

@@ -377,7 +383,7 @@ fetchHTML 是上面 <a href="#cn-myXCrawl" style="text-decoration: none">myXCra
377383
- 类型
378384

379385
```ts
380-
function fetchHTML(url: string): Promise<JSDOM>
386+
function fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM>
381387
```
382388

383389
- 示例
@@ -484,6 +490,12 @@ interface IFetchFileConfig extends IFetchBaseConifg {
484490
}
485491
```
486492

493+
- IFetchHTMLConfig
494+
495+
```ts
496+
interface IFetchHTMLConfig extends IRequestConfig {}
497+
```
498+
487499
## 更多
488500

489501
如有 **问题****需求** 请在 https://github.com/coder-hxl/x-crawl/issues 中提 **Issues** 。

package.json

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "0.1.0",
4+
"version": "0.1.1",
55
"author": "CoderHxl",
6-
"description": "XCrawl is a Nodejs multifunctional crawler library.",
6+
"description": "XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.",
77
"license": "MIT",
88
"main": "src/index.ts",
99
"scripts": {

publish/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
English | <a href="#cn" style="text-decoration: none">简体中文</a>
44

5-
XCrawl is a Nodejs multifunctional crawler library. Provide configuration to batch fetch HTML, JSON, images, etc.
5+
XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.
66

77
## Install
88

@@ -249,7 +249,7 @@ If you have any **questions** or **needs** , please submit **Issues in** https:/
249249
250250
<a href="#en" style="text-decoration: none">English</a> | 简体中文
251251
252-
XCrawl 是 Nodejs 多功能爬虫库。提供配置即可批量抓取 HTML 、JSON、图片等等
252+
XCrawl 是 Nodejs 多功能爬虫库。只需简单的配置即可抓取 HTML 、JSON、文件资源等等
253253
254254
## 安装
255255

publish/package.json

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
{
22
"name": "x-crawl",
3-
"version": "0.1.0",
3+
"version": "0.1.1",
44
"author": "CoderHxl",
5-
"description": "XCrawl is a Nodejs multifunctional crawler library.",
5+
"description": "XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.",
66
"license": "MIT",
77
"keywords": [
88
"nodejs",

src/index.ts

+8-3
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,13 @@ import path from 'node:path'
33
import { JSDOM } from 'jsdom'
44

55
import { batchRequest, request } from './request'
6-
import { isArray, isUndefined } from './utils'
6+
import { isArray, isString, isUndefined } from './utils'
77

88
import {
99
IXCrawlBaseConifg,
1010
IFetchDataConfig,
1111
IFetchFileConfig,
12+
IFetchHTMLConfig,
1213
IFetchBaseConifg,
1314
IFileInfo,
1415
IFetchCommon,
@@ -145,9 +146,13 @@ export default class XCrawl {
145146
})
146147
}
147148

148-
async fetchHTML(url: string): Promise<JSDOM> {
149+
async fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM> {
150+
const rawRequestConifg: IFetchHTMLConfig = isString(config)
151+
? { url: config }
152+
: config
153+
149154
const { requestConifg } = mergeConfig(this.baseConfig, {
150-
requestConifg: { url }
155+
requestConifg: rawRequestConifg
151156
})
152157

153158
const requestResItem = await request(requestConifg)

src/types.ts

+2
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,8 @@ export interface IFetchFileConfig extends IFetchBaseConifg {
7474
}
7575
}
7676

77+
export interface IFetchHTMLConfig extends IRequestConfig {}
78+
7779
export interface IFileInfo {
7880
fileName: string
7981
mimeType: string

test/start/index.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

test/start/index.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ const testXCrawl = new XCrawl({
2222
// console.log(res)
2323
// })
2424

25-
testXCrawl.fetchHTML('https://www.bilibili.com/').then((jsdom) => {
25+
testXCrawl.fetchHTML({ url: 'https://www.bilibili.com/' }).then((jsdom) => {
2626
const document = jsdom.window.document
2727
const imgBoxEl = document.querySelectorAll('.bili-video-card__cover')
2828

0 commit comments

Comments
 (0)