| id | upgrading-to-v4 |
|---|---|
| title | Upgrading to v4 |
import ApiLink from '@site/src/components/ApiLink';
This page summarizes most of the breaking changes in Crawlee v4.
Crawlee v4 is a native ESM package now. It can be still consumed from a CJS project, as long as you use TypeScript and Node.js version that supports require(esm).
Support for older node versions was dropped.
Support for older TypeScript versions was dropped. Older versions might work too, but only if your project is also ESM.
Previously, we kept the dependency on cheerio locked to the latest RC version, since there were many breaking changes introduced in v1.0. This release bumps cheerio to the stable v1. Also, we now use the default parse5 internally.
The crawler following options are removed:
handleRequestFunction->requestHandlerhandlePageFunction->requestHandlerhandleRequestTimeoutSecs->requestHandlerTimeoutSecshandleFailedRequestFunction->failedRequestHandler
The crawling context no longer includes the Error object for failed requests. Use the second parameter of the errorHandler or failedRequestHandler callbacks to access the error.
Previously, the crawling context extended a Record type, allowing to access any property. This was changed to a strict type, which means that you can only access properties that are defined in the context.
additionalBlockedStatusCodes parameter of Session.retireOnBlockedStatusCodes method is removed. Use the blockedStatusCodes crawler option instead.
This experimental option relied on an outdated manifest version for browser extensions, it is not possible to achieve this with the currently supported versions.
In v3, we introduced a new way to detect available resources for the crawler, available via systemInfoV2 flag. In v4, this is the default way to detect available resources. The old way is removed completely together with the systemInfoV2 flag.