Skip to content

Commit ac35805

Browse files
committed
docs: update
1 parent a965849 commit ac35805

File tree

8 files changed

+75
-16
lines changed

8 files changed

+75
-16
lines changed

README.md

+42-11
Original file line numberDiff line numberDiff line change
@@ -70,26 +70,57 @@ npm install x-crawl
7070

7171
## Example
7272

73-
Get the title of https://docs.github.com/zh/get-started as an example:
73+
Example of fetching featured video cover image for youtube homepage every other day:
7474

7575
```js
76-
// Import module ES/CJS
76+
// 1.Import module ES/CJS
7777
import xCrawl from 'x-crawl'
7878

79-
// Create a crawler instance
80-
const docsXCrawl = xCrawl({
81-
baseUrl: 'https://docs.github.com',
82-
timeout: 10000,
83-
intervalTime: { max: 2000, min: 1000 }
79+
// 2.Create a crawler instance
80+
const myXCrawl = xCrawl({
81+
timeout: 10000, // overtime time
82+
intervalTime: { max: 3000, min: 2000 } // control request frequency
8483
})
8584

86-
// Call fetchHTML API to crawl
87-
docsXCrawl.fetchHTML('/zh/get-started').then((res) => {
88-
const { jsdom } = res.data
89-
console.log(jsdom.window.document.querySelector('title')?.textContent)
85+
// 3.Set the crawling task
86+
// Call the startPolling API to start the polling function, and the callback function will be called every other day
87+
myXCrawl.startPolling({ d: 1 }, () => {
88+
// Call fetchHTML API to crawl HTML
89+
myXCrawl.fetchHTML('https://www.youtube.com/').then((res) => {
90+
const { jsdom } = res.data // By default, the JSDOM library is used to parse HTML
91+
92+
// Get the cover image element of the Promoted Video
93+
const imgEls = jsdom.window.document.querySelectorAll(
94+
'.yt-core-image--fill-parent-width'
95+
)
96+
97+
// set request configuration
98+
const requestConfig = []
99+
imgEls.forEach((item) => {
100+
if (item.src) {
101+
requestConfig.push({ url: item.src })
102+
}
103+
})
104+
105+
// Call the fetchFile API to crawl pictures
106+
myXCrawl.fetchFile({ requestConfig, fileConfig: { storeDir: './upload' } })
107+
})
90108
})
109+
91110
```
92111

112+
running result:
113+
114+
<div align="center">
115+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler.png" />
116+
</div>
117+
118+
<div align="center">
119+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler-result.png" />
120+
</div>
121+
122+
**Note:** Do not crawl randomly, here is just to demonstrate how to use XCrawl, and control the request frequency within 3000ms to 2000ms.
123+
93124
## Core concepts
94125

95126
### x-crawl
File renamed without changes.
File renamed without changes.

assets/en/crawler-result.png

134 KB
Loading

assets/en/crawler.png

39.3 KB
Loading

docs/cn.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,8 @@ import xCrawl from 'x-crawl'
7878
7979
// 2.创建一个爬虫实例
8080
const myXCrawl = xCrawl({
81-
timeout: 10000, // 超时时间
82-
intervalTime: { max: 3000, min: 2000 } // 控制请求频率
81+
timeout: 10000, // overtime time
82+
intervalTime: { max: 3000, min: 2000 } // control request frequency
8383
})
8484
8585
// 3.设置爬取任务
@@ -105,11 +105,11 @@ myXCrawl.startPolling({ d: 1 }, () => {
105105
运行效果:
106106
107107
<div align="center">
108-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/crawler.png" />
108+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/cn/crawler.png" />
109109
</div>
110110
111111
<div align="center">
112-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/crawler-result.png" />
112+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/cn/crawler-result.png" />
113113
</div>
114114
115115
**注意:** 请勿随意爬取,这里只是为了演示如何使用 XCrawl ,并将请求频率控制在 3000ms 到 2000ms 内。

test/start/index.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

test/start/index.ts

+28
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,31 @@ import xCrawl from '../../src'
2828
// fileConfig: { storeDir: path.resolve(__dirname, 'upload') }
2929
// })
3030
// })
31+
32+
const myXCrawl = xCrawl({
33+
timeout: 10000,
34+
intervalTime: { max: 3000, min: 2000 },
35+
proxy: 'http://127.0.0.1:14892'
36+
})
37+
38+
myXCrawl.startPolling({ d: 1 }, () => {
39+
myXCrawl.fetchHTML('https://www.youtube.com/').then((res) => {
40+
const { jsdom } = res.data
41+
42+
const imgEls = jsdom.window.document.querySelectorAll<HTMLImageElement>(
43+
'.yt-core-image--fill-parent-width'
44+
)
45+
46+
const requestConfig: any[] = []
47+
imgEls.forEach((item) => {
48+
if (item.src) {
49+
requestConfig.push({ url: item.src })
50+
}
51+
})
52+
53+
myXCrawl.fetchFile({
54+
requestConfig,
55+
fileConfig: { storeDir: path.resolve(__dirname, './upload') }
56+
})
57+
})
58+
})

0 commit comments

Comments
 (0)