Skip to content

Commit d46b0c0

Browse files
committed
Add polling function
1 parent 54a574a commit d46b0c0

File tree

11 files changed

+172
-69
lines changed

11 files changed

+172
-69
lines changed

.eslintrc.js

+2-1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ module.exports = {
1414
rules: {
1515
'@typescript-eslint/no-explicit-any': 'off',
1616
'@typescript-eslint/no-empty-interface': 'off',
17-
'@typescript-eslint/no-var-requires': 'off'
17+
'@typescript-eslint/no-var-requires': 'off',
18+
'@typescript-eslint/no-non-null-assertion': 'off'
1819
}
1920
}

README.md

+37-1
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@ XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resou
66

77
## highlights
88

9-
- Simple configuration to grab HTML, JSON, file resources, etc.
9+
- Simple configuration to grab HTML, JSON, file resources, etc
1010
- Batch requests can choose mode asynchronous or synchronous
11+
- polling function
1112
- Anthropomorphic request interval
1213

1314
## Install
@@ -54,6 +55,7 @@ class XCrawl {
5455
fetchHTML(config: IFetchHTMLConfig): Promise<IFetchHTML>
5556
fetchData<T = any>(config: IFetchDataConfig): Promise<IFetchCommon<T>>
5657
fetchFile(config: IFetchFileConfig): Promise<IFetchCommon<IFileInfo>>
58+
fetchPolling(config: IFetchPollingConfig, callback: (count: number) => void): void
5759
}
5860
```
5961
@@ -168,6 +170,28 @@ myXCrawl.fetchFile({
168170
})
169171
```
170172
173+
### fetchPolling
174+
175+
fetchPolling is a method of the [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, typically used to perform polling operations, such as getting news every once in a while.
176+
177+
#### 类型
178+
179+
```ts
180+
function fetchPolling(
181+
config: IFetchPollingConfig,
182+
callback: (count: number) => void
183+
): void
184+
```
185+
186+
#### 示例
187+
188+
```js
189+
myXCrawl.fetchPolling({ h: 1, m: 30 }, () => {
190+
// will be executed every one and a half hours
191+
// fetchHTML/fetchData/fetchFile
192+
})
193+
```
194+
171195
## Types
172196
173197
#### IAnyObject
@@ -249,6 +273,18 @@ interface IFetchFileConfig extends IFetchBaseConifg {
249273
}
250274
```
251275
276+
#### IFetchPollingConfig
277+
278+
```ts
279+
interface IFetchPollingConfig {
280+
Y?: number // Year (365 days per year)
281+
M?: number // Month (30 days per month)
282+
d?: number // day
283+
h?: number // hour
284+
m?: number // minute
285+
}
286+
```
287+
252288
#### IFetchCommon
253289
254290
```ts

document/cn.md

+51-12
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ XCrawl 是 Nodejs 多功能爬虫库。只需简单的配置即可抓取 HTML
88

99
- 简单的配置即可抓取 HTML 、JSON 、文件资源等等
1010
- 批量请求可选择模式 异步 或 同步
11+
- 轮询功能
1112
- 拟人化的请求间隔时间
1213

1314
## 安装
@@ -20,7 +21,7 @@ npm install x-crawl
2021

2122
## 示例
2223

23-
获取 bilibili 国漫主页的推荐轮播图片为例:
24+
每隔一天就获取 bilibili 国漫主页的推荐轮播图片为例:
2425

2526
```js
2627
// 1.导入模块 ES/CJS
@@ -32,18 +33,21 @@ const myXCrawl = new XCrawl({
3233
intervalTime: { max: 6000, min: 2000 } // 控制请求频率
3334
})
3435
35-
// 3.调用 fetchHTML API 爬取 HTML
36-
myXCrawl.fetchHTML('https://www.bilibili.com/guochuang/').then((res) => {
37-
const { jsdom } = res.data // 默认使用了 JSDOM 库解析 HTML
36+
// 3.调用 fetchPolling API 开始轮询功能,每隔一天会调用回调函数
37+
myXCrawl.fetchPolling({ d: 1 }, () => {
38+
// 3.1.调用 fetchHTML API 爬取 HTML
39+
myXCrawl.fetchHTML('https://www.bilibili.com/guochuang/').then((res) => {
40+
const { jsdom } = res.data // 默认使用了 JSDOM 库解析 HTML
3841
39-
// 3.1.获取轮播图片的 src
40-
const imgSrc = []
41-
const recomEls = jsdom.window.document.querySelectorAll('.chief-recom-item')
42-
recomEls.forEach((item) => imgSrc.push(item.querySelector('img').src))
42+
// 3.2.获取轮播图片的 src
43+
const imgSrc = []
44+
const recomEls = jsdom.window.document.querySelectorAll('.chief-recom-item')
45+
recomEls.forEach((item) => imgSrc.push(item.querySelector('img').src))
4346
44-
// 3.2.调用 fetchFile API 爬取图片
45-
const requestConifg = imgSrc.map((src) => ({ url: `https:${src}` }))
46-
myXCrawl.fetchFile({ requestConifg, fileConfig: { storeDir: './upload' } })
47+
// 3.3.调用 fetchFile API 爬取图片
48+
const requestConifg = imgSrc.map((src) => ({ url: `https:${src}` }))
49+
myXCrawl.fetchFile({ requestConifg, fileConfig: { storeDir: './upload' } })
50+
})
4751
})
4852
```
4953
@@ -63,6 +67,7 @@ class XCrawl {
6367
fetchHTML(config: IFetchHTMLConfig): Promise<IFetchHTML>
6468
fetchData<T = any>(config: IFetchDataConfig): Promise<IFetchCommon<T>>
6569
fetchFile(config: IFetchFileConfig): Promise<IFetchCommon<IFileInfo>>
70+
fetchPolling(config: IFetchPollingConfig, callback: (count: number) => void): void
6671
}
6772
```
6873
@@ -177,6 +182,28 @@ myXCrawl.fetchFile({
177182
})
178183
```
179184
185+
### fetchPolling
186+
187+
fetchPolling 是 [myXCrawl](https://github.com/coder-hxl/x-crawl/blob/main/document/cn.md#%E7%A4%BA%E4%BE%8B-1) 实例的方法,通常用于进行轮询操作,比如每隔一段时间获取新闻之类的。
188+
189+
#### 类型
190+
191+
```ts
192+
function fetchPolling(
193+
config: IFetchPollingConfig,
194+
callback: (count: number) => void
195+
): void
196+
```
197+
198+
#### 示例
199+
200+
```js
201+
myXCrawl.fetchPolling({ h: 1, m: 30 }, () => {
202+
// 每隔一个半小时会执行一次
203+
// fetchHTML/fetchData/fetchFile
204+
})
205+
```
206+
180207
## 类型
181208
182209
#### IAnyObject
@@ -258,13 +285,25 @@ interface IFetchFileConfig extends IFetchBaseConifg {
258285
}
259286
```
260287
288+
#### IFetchPollingConfig
289+
290+
```ts
291+
interface IFetchPollingConfig {
292+
Y?: number // 年 (按每年365天)
293+
M?: number // 月 (按每月30天)
294+
d?: number // 日
295+
h?: number // 小时
296+
m?: number // 分钟
297+
}
298+
```
299+
261300
#### IFetchCommon
262301
263302
```ts
264303
type IFetchCommon<T> = {
265304
id: number
266305
statusCode: number | undefined
267-
headers: IncomingHttpHeaders // node:http type
306+
headers: IncomingHttpHeaders // node:http 类型
268307
data: T
269308
}[]
270309
```

package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "0.1.6",
4+
"version": "0.2.0",
55
"author": "CoderHxl",
66
"description": "XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.",
77
"license": "MIT",

publish/README.md

+37-2
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@ XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resou
66

77
## highlights
88

9-
- Simple configuration to grab HTML, JSON, file resources, etc.
9+
- Simple configuration to grab HTML, JSON, file resources, etc
1010
- Batch requests can choose mode asynchronous or synchronous
11+
- polling function
1112
- Anthropomorphic request interval
1213

1314
## Install
@@ -50,11 +51,11 @@ Create a crawler instance via new XCrawl. The request queue is maintained by the
5051
5152
```ts
5253
class XCrawl {
53-
private readonly baseConfig
5454
constructor(baseConfig?: IXCrawlBaseConifg)
5555
fetchHTML(config: IFetchHTMLConfig): Promise<IFetchHTML>
5656
fetchData<T = any>(config: IFetchDataConfig): Promise<IFetchCommon<T>>
5757
fetchFile(config: IFetchFileConfig): Promise<IFetchCommon<IFileInfo>>
58+
fetchPolling(config: IFetchPollingConfig, callback: (count: number) => void): void
5859
}
5960
```
6061
@@ -169,6 +170,28 @@ myXCrawl.fetchFile({
169170
})
170171
```
171172
173+
### fetchPolling
174+
175+
fetchPolling is a method of the [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, typically used to perform polling operations, such as getting news every once in a while.
176+
177+
#### 类型
178+
179+
```ts
180+
function fetchPolling(
181+
config: IFetchPollingConfig,
182+
callback: (count: number) => void
183+
): void
184+
```
185+
186+
#### 示例
187+
188+
```js
189+
myXCrawl.fetchPolling({ h: 1, m: 30 }, () => {
190+
// will be executed every one and a half hours
191+
// fetchHTML/fetchData/fetchFile
192+
})
193+
```
194+
172195
## Types
173196
174197
#### IAnyObject
@@ -250,6 +273,18 @@ interface IFetchFileConfig extends IFetchBaseConifg {
250273
}
251274
```
252275
276+
#### IFetchPollingConfig
277+
278+
```ts
279+
interface IFetchPollingConfig {
280+
Y?: number // Year (365 days per year)
281+
M?: number // Month (30 days per month)
282+
d?: number // day
283+
h?: number // hour
284+
m?: number // minute
285+
}
286+
```
287+
253288
#### IFetchCommon
254289
255290
```ts

publish/package.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "0.1.6",
3+
"version": "0.2.0",
44
"author": "CoderHxl",
55
"description": "XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.",
66
"license": "MIT",

src/index.ts

+23-1
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,16 @@ import {
1010
log,
1111
logError,
1212
logNumber,
13-
logSuccess
13+
logSuccess,
14+
logWarn
1415
} from './utils'
1516

1617
import {
1718
IXCrawlBaseConifg,
1819
IFetchHTMLConfig,
1920
IFetchDataConfig,
2021
IFetchFileConfig,
22+
IFetchPollingConfig,
2123
IFetchBaseConifg,
2224
IFetchCommon,
2325
IFileInfo,
@@ -167,4 +169,24 @@ export default class XCrawl {
167169

168170
return container
169171
}
172+
173+
fetchPolling(config: IFetchPollingConfig, callback: (count: number) => void) {
174+
const { Y, M, d, h, m } = config
175+
176+
const year = !isUndefined(Y) ? Y * 1000 * 60 * 60 * 24 * 365 : 0
177+
const month = !isUndefined(M) ? M * 1000 * 60 * 60 * 24 * 30 : 0
178+
const day = !isUndefined(d) ? d * 1000 * 60 * 60 * 24 : 0
179+
const hour = !isUndefined(h) ? h * 1000 * 60 * 60 : 0
180+
const minute = !isUndefined(m) ? m * 1000 * 60 : 0
181+
const total = year + month + day + hour + minute
182+
183+
let count = 0
184+
function cb() {
185+
console.log(logWarn(`Start the ${logWarn.bold(++count)} polling`))
186+
callback(count)
187+
}
188+
189+
cb()
190+
setInterval(cb, total)
191+
}
170192
}

src/types.ts

+8
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,14 @@ export interface IFetchFileConfig extends IFetchBaseConifg {
7878
}
7979
}
8080

81+
export interface IFetchPollingConfig {
82+
Y?: number
83+
M?: number
84+
d?: number
85+
h?: number
86+
m?: number
87+
}
88+
8189
export type IFetchCommon<T> = {
8290
id: number
8391
statusCode: number | undefined

src/utils.ts

+1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ export const log = console.log
1818
export const logNumber = chalk.hex('#a57fff')
1919
export const logSuccess = chalk.green
2020
export const logError = chalk.red
21+
export const logWarn = chalk.yellow
2122

2223
export function isUndefined(value: any): value is undefined {
2324
return typeof value === 'undefined'

test/start/index.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)