Skip to content

Commit a063fd1

Browse files
committed
Update: Docs
1 parent 18adb3d commit a063fd1

File tree

3 files changed

+28
-4
lines changed

3 files changed

+28
-4
lines changed

README.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ const myXCrawl = xCrawl({
104104
*/
105105
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
106106
// Call crawlPage API to crawl Page
107-
const { jsdom } = await myXCrawl.crawlPage('https://zh.airbnb.com/s/*/plus_homes')
107+
const { jsdom, page } = await myXCrawl.crawlPage('https://zh.airbnb.com/s/*/plus_homes')
108108

109109
// Get the cover image elements for Plus listings
110110
const imgEls = jsdom.window.document.querySelector('.a1stauiv')?.querySelectorAll('picture img')
@@ -115,6 +115,9 @@ myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
115115

116116
// Call the crawlFile API to crawl pictures
117117
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
118+
119+
// Close page
120+
page.close()
118121
})
119122
```
120123
@@ -218,6 +221,8 @@ The browser instance is a headless browser without a UI shell. What he does is t
218221
219222
It is an instance object of [Page](https://pptr.dev/api/puppeteer.page). The instance can also perform interactive operations such as events. For specific usage, please refer to [page](https://pptr.dev /api/puppeteer. page).
220223
224+
The browser instance will retain a reference to the page instance. If it is no longer used in the future, the page instance needs to be closed by itself, otherwise it will cause a memory leak.
225+
221226
**Parse the page by yourself**
222227
223228
Take the jsdom library as an example:
@@ -323,9 +328,12 @@ myXCrawl.startPolling({ h: 2, m: 30 }, async (count, stopPolling) => {
323328
// will be executed every two and a half hours
324329
// crawlPage/crawlData/crawlFile
325330
const { jsdom, browser, page } = await myXCrawl.crawlPage('https://xxx.com')
331+
page.close()
326332
})
327333
```
328334
335+
**Using crawlPage in polling Note:** Calling page.close() is to prevent the browser instance from retaining references to the page instance. If it is no longer used in the future, you need to close the page instance yourself, otherwise it will cause memory leaks.
336+
329337
Callback function parameters:
330338
331339
- The count attribute records the current number of polling operations.

docs/cn.md

+10-2
Original file line numberDiff line numberDiff line change
@@ -99,9 +99,9 @@ const myXCrawl = xCrawl({
9999

100100
// 3.设置爬取任务
101101
// 调用 startPolling API 开始轮询功能,每隔一天会调用回调函数
102-
myXCrawl.startPolling({ d: 1 }, async () => {
102+
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
103103
// 调用 crawlPage API 爬取 Page
104-
const { jsdom } = await myXCrawl.crawlPage('https://www.bilibili.com/guochuang/')
104+
const { jsdom, page } = await myXCrawl.crawlPage('https://www.bilibili.com/guochuang/')
105105

106106
// 获取轮播图片元素
107107
const imgEls = jsdom.window.document.querySelectorAll('.chief-recom-item img')
@@ -112,6 +112,9 @@ myXCrawl.startPolling({ d: 1 }, async () => {
112112

113113
// 调用 crawlFile API 爬取图片
114114
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
115+
116+
// 关闭页面
117+
page.close()
115118
})
116119
```
117120

@@ -213,6 +216,8 @@ browser 实例他是个无头浏览器,并无 UI 外壳,他做的是将浏
213216

214217
它是 [Page](https://pptr.dev/api/puppeteer.page) 的实例对象,实例还可以做事件之类的交互操作,具体使用可以参考 [page](https://pptr.dev/api/puppeteer.page)
215218

219+
browser 实例内部会保留着对 page 实例的引用,如果后续不再使用需要自行关闭 page 实例,否则会造成内存泄露。
220+
216221
**自行解析页面**
217222

218223
以使用 jsdom 库为例:
@@ -317,9 +322,12 @@ myXCrawl.startPolling({ h: 2, m: 30 }, async (count, stopPolling) => {
317322
// 每隔两个半小时会执行一次
318323
// crawlPage/crawlData/crawlFile
319324
const { jsdom, browser, page } = await myXCrawl.crawlPage('https://xxx.com')
325+
page.close()
320326
})
321327
```
322328

329+
**在轮询中使用 crawlPage 注意:** 调用 page.close() 是为了防止 browser 实例内部还保留着对 page 实例的引用,如果后续不再使用需要自行关闭 page 实例,否则会造成内存泄露。
330+
323331
回调函数参数:
324332

325333
- count 属性记录当前是第几次轮询操作。

publish/README.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ const myXCrawl = xCrawl({
104104
*/
105105
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
106106
// Call crawlPage API to crawl Page
107-
const { jsdom } = await myXCrawl.crawlPage('https://zh.airbnb.com/s/*/plus_homes')
107+
const { jsdom, page } = await myXCrawl.crawlPage('https://zh.airbnb.com/s/*/plus_homes')
108108

109109
// Get the cover image elements for Plus listings
110110
const imgEls = jsdom.window.document.querySelector('.a1stauiv')?.querySelectorAll('picture img')
@@ -115,6 +115,9 @@ myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
115115

116116
// Call the crawlFile API to crawl pictures
117117
myXCrawl.crawlFile({ requestConfig, fileConfig: { storeDir: './upload' } })
118+
119+
// Close page
120+
page.close()
118121
})
119122
```
120123
@@ -218,6 +221,8 @@ The browser instance is a headless browser without a UI shell. What he does is t
218221
219222
It is an instance object of [Page](https://pptr.dev/api/puppeteer.page). The instance can also perform interactive operations such as events. For specific usage, please refer to [page](https://pptr.dev /api/puppeteer. page).
220223
224+
The browser instance will retain a reference to the page instance. If it is no longer used in the future, the page instance needs to be closed by itself, otherwise it will cause a memory leak.
225+
221226
**Parse the page by yourself**
222227
223228
Take the jsdom library as an example:
@@ -323,9 +328,12 @@ myXCrawl.startPolling({ h: 2, m: 30 }, async (count, stopPolling) => {
323328
// will be executed every two and a half hours
324329
// crawlPage/crawlData/crawlFile
325330
const { jsdom, browser, page } = await myXCrawl.crawlPage('https://xxx.com')
331+
page.close()
326332
})
327333
```
328334
335+
**Using crawlPage in polling Note:** Calling page.close() is to prevent the browser instance from retaining references to the page instance. If it is no longer used in the future, you need to close the page instance yourself, otherwise it will cause memory leaks.
336+
329337
Callback function parameters:
330338
331339
- The count attribute records the current number of polling operations.

0 commit comments

Comments
 (0)