|
1 |
| -# [v6.0.0](https://github.com/coder-hxl/x-crawl/compare/v5.1.0...v6.0.0) (2023-04-18) |
| 1 | +# [v6.0.0](https://github.com/coder-hxl/x-crawl/compare/v5.1.0...v6.0.0) (2023-04-19) |
| 2 | + |
| 3 | +### 🚨 Breaking Changes |
| 4 | + |
| 5 | +- About the result processing of each crawling target: it will start processing after a single target is completed, saving time and improving performance. Originally, it waited for all targets to be completed before processing, and there would be free time during the crawling process. |
| 6 | +- About the execution timing of the second parameter callback function of the crawlPage, crawlData, and crawlFile APIs: it will be executed at the end, and the result obtained is the same as the result of the Promise method. |
| 7 | +- About the type: PageRequestConfig, DataRequestConfig and FileRequestConfig are changed to CrawlPageDetailTargetConfig, CrawlDataDetailTargetConfig and CrawlFileDetailTargetConfig respectively, the purpose is to not only add the configuration of the request, but also expand more, called detailed target usage. CrawlPageConfigObject, CrawlDataConfigObject, and CrawlFileConfigObject changed to CrawlPageAdvancedConfig, CrawlDataAdvancedConfig, and CrawlFileAdvancedConfig respectively, named Advanced Usage. |
| 8 | +- Configuration options in fileConfig of crawlFile: can be set directly in the root object configuration. The beforeSave lifecycle function changed to onBeforeSaveItemFile. |
| 9 | +- About the object results of crawlPage, crawlData and crawlFile: remove the crawlCount attribute, and get the number of times by retryCount + 1. errorQueue was renamed to crawlErrorQueue. |
| 10 | + |
| 11 | +### 🚀 Features |
| 12 | + |
| 13 | +- Added device fingerprint to avoid identifying and tracking us from different locations through fingerprint recognition. You can use the default with a switch, and if you need to specify it, you can set it uniformly for all crawling targets in the advanced usage, or you can specify the settings through the detailed target usage. |
| 14 | +- Adding multiple attributes for each advanced usage can be configured in an advanced way to set the object uniformly, without having to set it repeatedly for each target configuration. Added onCrawlItemComplete lifecycle function, which will be executed after each crawling goal is completed, and the crawling result (similar to CrawlSingleRes) will be passed to the callback function. |
| 15 | +- Added crawlPage in the configuration of creating a crawler application, you can set the configuration of creating a browser in the crawlPage.launchBrowser option (type is PuppeteerLaunchOptions from Puppeteer). |
| 16 | +- crawlPage adds viewport option, which is used to set the viewport of the page. |
| 17 | + |
| 18 | +--- |
2 | 19 |
|
3 | 20 | ### 🚨 重大改变
|
4 | 21 |
|
5 | 22 | - 关于对每个爬取目标的结果处理:将会在单个目标完成后就开始进行处理,节省时间,提高性能。原先是等所有目标完成再处理,在爬过程中会有空闲时间。
|
6 | 23 | - 关于 crawlPage、crawlData 以及 crawlFile 这三个 API 的第二个参数回调函数的执行时机:将移到最后执行,获取的结果跟 Promise 方式的结果相同。
|
7 |
| -- 关于类型:PageRequestConfig、DataRequestConfig 以及 FileRequestConfig 分别更改为 CrawlPageDetailTargetConfig、CrawlDataDetailTargetConfig 以及 CrawlFileDetailTargetConfig ,目的是为了不单单可以加请求的配置,也可以扩展更多。CrawlPageConfigObject、 CrawlDataConfigObject 以及 CrawlFileConfigObject 分别更改为 CrawlPageAdvancedConfig、CrawlDataAdvancedConfig 以及 CrawlFileAdvancedConfig。 |
8 |
| -- 关于 crawlFile 的 fileConfig 里面的配置选项:可以直接在根对象配置中设置。beforeSave 生命周期函数更改为 onBeforeSaveFile。 |
| 24 | +- 关于类型:PageRequestConfig、DataRequestConfig 以及 FileRequestConfig 分别更改为 CrawlPageDetailTargetConfig、CrawlDataDetailTargetConfig 以及 CrawlFileDetailTargetConfig ,目的是为了不单单可以加请求的配置,也可以扩展更多,名为详细目标用法。CrawlPageConfigObject、 CrawlDataConfigObject 以及 CrawlFileConfigObject 分别更改为 CrawlPageAdvancedConfig、CrawlDataAdvancedConfig 以及 CrawlFileAdvancedConfig ,名为进阶用法。 |
| 25 | +- 关于 crawlFile 的 fileConfig 里面的配置选项:可以直接在根对象配置中设置。beforeSave 生命周期函数更改为 onBeforeSaveItemFile。 |
9 | 26 | - 关于 crawlPage、crawlData 以及 crawlFile 的对象结果:移除 crawlCount 属性,可通过 retryCount + 1 获取次数。errorQueue 更名为 crawlErrorQueue。
|
10 | 27 |
|
11 | 28 | ### 🚀 特征
|
12 | 29 |
|
13 |
| -- 新增设备指纹,避免浏览器识别并跟踪我们的在线行为。可以通过一个开关使用默认的,如果需指定则可在进阶用法中为所有爬取目标统一设置,也可以通过详细目标用法指定设置。 |
| 30 | +- 新增设备指纹,可避免通过指纹识别从不同位置识别并跟踪我们。可以通过一个开关使用默认的,如果需指定则可在进阶用法中为所有爬取目标统一设置,也可以通过详细目标用法指定设置。 |
| 31 | +- 每个进阶用法新增多个属性可以在进阶方式配置对象统一设置,不必为每个目标配置重复设置一遍。新增 onCrawlItemComplete 生命周期函数,将在每个爬取目标完成后执行,并且把爬取结果(类似 CrawlSingleRes)传入回调函数。 |
14 | 32 | - 在创建爬虫应用的配置新增 crawlPage ,可以在 crawlPage.launchBrowser 选项中设置创建浏览器的配置(类型为 PuppeteerLaunchOptions 来自 Puppeteer)。
|
15 |
| -- CrawlPageAdvancedConfig、CrawlDataAdvancedConfig 以及 CrawlFileAdvancedConfig 进阶用法里面的每个爬取请求 header 可以在进阶方式配置对象统一设置,不必为每个爬取配置重复设置一遍。 |
16 | 33 | - crawlPage 新增 viewport 选项,用于设置页面的视口。
|
17 |
| -- 新增 onCrawlItemComplete 生命周期函数,将在每个爬取目标完成后执行,并且把爬取结果(类似 CrawlSingleRes)传入回调函数。可在进阶用法设置。 |
18 | 34 |
|
19 | 35 | # [v5.1.0](https://github.com/coder-hxl/x-crawl/compare/v5.0.2...v5.1.0) (2023-04-12)
|
20 | 36 |
|
|
0 commit comments