Skip to content

Latest commit

 

History

History
53 lines (30 loc) · 2.26 KB

File metadata and controls

53 lines (30 loc) · 2.26 KB
id upgrading-to-v4
title Upgrading to v4

import ApiLink from '@site/src/components/ApiLink';

This page summarizes most of the breaking changes in Crawlee v4.

ECMAScript modules

Crawlee v4 is a native ESM package now. It can be still consumed from a CJS project, as long as you use TypeScript and Node.js version that supports require(esm).

Node 22+ required

Support for older node versions was dropped.

TypeScript 5.8+ required

Support for older TypeScript versions was dropped. Older versions might work too, but only if your project is also ESM.

Cheerio v1

Previously, we kept the dependency on cheerio locked to the latest RC version, since there were many breaking changes introduced in v1.0. This release bumps cheerio to the stable v1. Also, we now use the default parse5 internally.

Deprecated crawler options are removed

The crawler following options are removed:

  • handleRequestFunction -> requestHandler
  • handlePageFunction -> requestHandler
  • handleRequestTimeoutSecs -> requestHandlerTimeoutSecs
  • handleFailedRequestFunction -> failedRequestHandler

Crawling context no longer includes Error for failed requests

The crawling context no longer includes the Error object for failed requests. Use the second parameter of the errorHandler or failedRequestHandler callbacks to access the error.

Crawling context is strictly typed

Previously, the crawling context extended a Record type, allowing to access any property. This was changed to a strict type, which means that you can only access properties that are defined in the context.

additionalBlockedStatusCodes parameter is removed

additionalBlockedStatusCodes parameter of Session.retireOnBlockedStatusCodes method is removed. Use the blockedStatusCodes crawler option instead.

Remove experimentalContainers option

This experimental option relied on an outdated manifest version for browser extensions, it is not possible to achieve this with the currently supported versions.

Available resource detection

In v3, we introduced a new way to detect available resources for the crawler, available via systemInfoV2 flag. In v4, this is the default way to detect available resources. The old way is removed completely together with the systemInfoV2 flag.