Added a minimum delay between reuses of any given :ref:`plugin-managed session <session>`.
It is :setting:`DOWNLOAD_DELAY` by default. Use :setting:`ZYTE_API_SESSION_DELAY` to change that or :setting:`ZYTE_API_SESSION_POOLS` to override it for specific :setting:`session pools <session-pools>`.
:setting:`ZYTE_API_SESSION_RANDOMIZE_DELAY` controls whether that minimum delay is randomized by multiplying it by a random factor between 0.5 and 1.5. It defaults to :setting:`RANDOMIZE_DOWNLOAD_DELAY`.
The value of the :reqmeta:`zyte_api_session_pool` request metadata key and the return value of the :meth:`SessionConfig.pool() <scrapy_zyte_api.SessionConfig.pool>` method can now be a dictionary instead of a string, allowing to override :setting:`ZYTE_API_SESSION_DELAY` and :setting:`ZYTE_API_SESSION_POOL_SIZE` for the corresponding pool.
However, they cannot override values defined in :setting:`ZYTE_API_SESSION_POOLS`.
Deprecated the
ZYTE_API_SESSION_POOL_SIZESsetting in favor of the new :setting:`ZYTE_API_SESSION_POOLS` setting, where you can set"size".Changed the terminology around :ref:`session management <session>` to try to make it clearer and more consistent:
client-managed sessions → user-managed sessionsserver-managed sessions → Zyte-managed sessionsscrapy-zyte-api session management → plugin-managed sessionsAdded a :ref:`session-troubleshooting` section to the :ref:`session` page.
- Dropped support for Python 3.9.
- Added support for Scrapy 2.14+.
- Added web-poet test fixture support for :class:`~scrapy_zyte_api.Actions`, :class:`~scrapy_zyte_api.Screenshot`, and :class:`~scrapy_zyte_api.Geolocation`.
- Improved typing and added
py.typedto indicate typing support.
- Added :ref:`x402 support <x402>`.
- Request fingerprinting no longer tries to take scrapy-poet into account if scrapy-poet is installed but is not enabled.
- Extended :doc:`Scrapy <scrapy:index>` support to :ref:`2.13.0+ <scrapy:release-2.13.0>`.
- Switched the minimum required version of :doc:`python-zyte-api
<python-zyte-api:index>` from
0.5.1to0.6.0. - Fixed the removal of default request headers (
Accept,Accept-Encoding,Accept-Language, andUser-Agent) not working for request copies (e.g. redirects or retries). - The default value of the :setting:`ZYTE_API_FALLBACK_HTTP_HANDLER` and :setting:`ZYTE_API_FALLBACK_HTTPS_HANDLER` settings is as expected even when not using the add-on.
- The scrapy-zyte-api download handlers now support fallback download
handlers that do not define a
close()method.
- Improve the removal and mapping of proxy headers accidentally included in
requests:
- Remove or map :ref:`Zyte API proxy mode headers <zapi-proxy-headers>`
(
Zyte-…), not only :ref:`Smart Proxy Manager headers <spm-request-headers>` (X-Crawlera-…). - Remove or map headers defined through :http:`request:customHttpRequestHeaders`, not only those defined in :attr:`Request.headers <scrapy.http.Request.headers>`.
- Remove or map :ref:`Zyte API proxy mode headers <zapi-proxy-headers>`
(
- Support :meth:`~scrapy.Spider.start_requests` yielding items, which is possible since Scrapy 2.12.
Added :ref:`automatic mapping <automap>` support for new Zyte API request fields: :http:`request:customAttributes`, :http:`request:customAttributesOptions`, :http:`request:ipType`, :http:`request:followRedirect`, :http:`request:forumThread`, :http:`request:forumThreadOptions`, :http:`request:jobPostingNavigation`, :http:`request:jobPostingNavigationOptions`, :http:`request:networkCapture`, :http:`request:serp`, :http:`request:serpOptions`, :http:`request:session`, :http:`request:tags`.
- You will now be warned when using their default values unnecessarily.
- By default, the following fields no longer affect request fingerprinting (i.e. 2 request identical except for the value of that field are now considered duplicate requests): :http:`request:ipType`, :http:`request:session`.
- When enabling :http:`request:serp`, :http:`request:httpResponseBody` and :http:`request:httpResponseHeaders` will no longer be enabled by default, and :ref:`request header mapping <request-header-mapping>` is disabled.
Session pool IDs, of Zyte-managed sessions (:http:`request:sessionContext`) or :ref:`plugin-managed sessions <session-pools>`, now affect request fingerprinting: 2 requests identical except for their session pool ID are not considered duplicate requests any longer.
When it is not clear whether a request will use browser rendering or not, e.g. an :ref:`automatic extraction request <zapi-extract>` without an :http:`extractFrom <request:productOptions.extractFrom>` value, the URL fragment is now taken into account for request fingerprinting, i.e.
https://example.com#aandhttps://example.com#bare not considered duplicate requests anymore in those scenarios.New setting: :setting:`ZYTE_API_SESSION_MAX_CHECK_FAILURES`.
The :reqmeta:`download_latency` request metadata key is now set for Zyte API requests if it can be done without causing the :ref:`AutoThrottle extension <topics-autothrottle>` to delay Zyte API requests, e.g. if :setting:`AUTOTHROTTLE_ENABLED` is
False(default) or you are using Scrapy 2.12+.Fixes
"auto"being considered the default value of :http:`request:device` instead of"desktop".When using :doc:`scrapy-poet <scrapy-poet:index>` 0.26.0 or higher, the scrapy-zyte-api add-on no longer adds :class:`scrapy_poet.InjectionMiddleware` to :setting:`DOWNLOADER_MIDDLEWARES`. Use the scrapy-poet add-on instead to enable that and other Scrapy components required for scrapy-poet setup:
ADDONS = { "scrapy_poet.Addon": 300, "scrapy_zyte_api.Addon": 500, }
- :ref:`scrapy-poet integration <scrapy-poet>` now supports :class:`~zyte_common_items.Serp` injection from :ref:`Zyte API automatic extraction <zapi-extract>`.
- :class:`~.SessionConfig` now supports a :meth:`~.SessionConfig.process_request` method, which can be used to modify requests based on data from the initialization of the session they have been assigned.
- The new :func:`~.get_request_session_id` function allows getting the session ID that has been assigned to a given request.
- :ref:`referer` is now disabled by default for Zyte API requests. This can be configured with the new :setting:`ZYTE_API_REFERRER_POLICY` setting.
- CI improvements.
- Improved Scrapy 2.12 support (typing, deprecations).
- The :ref:`retry-policy` page now shows how to configure the :ref:`aggressive retry policy <aggressive-retry-policy>`.
- :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` are now also enforced on requests sent through Zyte API.
- Added official Python 3.13 support, removed official Python 3.8 support.
- Fixed a race condition that could allow more Zyte API requests than those configured in the :setting:`ZYTE_API_MAX_REQUESTS` setting.
- Added support for
zyte_common_items.JobPostingNavigationto the scrapy-poet provider.
- Added support for :ref:`custom attribute extraction <custom-attrs>`.
- Added the :class:`~scrapy_zyte_api.LocationSessionConfig` class.
- Fixed an issue in the handling of excessive session initialization failures
during session refreshing, which would manifest as an asyncio messages about
unretrieved
TooManyBadSessionInitstask exceptions instead of stopping the spider as intended.
scrapy-zyte-api[provider]now requires :doc:`zyte-common-items <zyte-common-items:index>` 0.20.0+.- Added the :setting:`ZYTE_API_AUTO_FIELD_STATS` setting.
- Added the :func:`~scrapy_zyte_api.is_session_init_request` function.
- Added the :data:`~scrapy_zyte_api.session_config_registry` variable.
Backward-incompatible change: The precedence of session param settings, request metadata keys and session config override methods has changed.
Before, priority from higher to lower was:
- :meth:`~scrapy_zyte_api.SessionConfig.params`
- :meth:`~scrapy_zyte_api.SessionConfig.location`
- :reqmeta:`zyte_api_session_location`
- :setting:`ZYTE_API_SESSION_LOCATION`
- :reqmeta:`zyte_api_session_params`
- :setting:`ZYTE_API_SESSION_PARAMS`
Now, it is:
When using the :reqmeta:`zyte_api_session_params` or :reqmeta:`zyte_api_session_location` request metadata keys, a different pool ID is now generated by default based on their value. See :meth:`~scrapy_zyte_api.SessionConfig.pool` for details.
The new :reqmeta:`zyte_api_session_pool` request metadata key allows overriding the pool ID of a request.
Fixed some documentation examples where the parameters of the
checkmethod of :setting:`ZYTE_API_SESSION_CHECKER` were in reverse order.
If the :setting:`AUTOTHROTTLE_ENABLED <scrapy:AUTOTHROTTLE_ENABLED>` setting is
False, the delay of download slots for Zyte API requests no longer resets to zero, and instead scrapy-zyte-api respects the :setting:`DOWNLOAD_DELAY <scrapy:DOWNLOAD_DELAY>` setting andzyte-api@-prefixed entries in the :setting:`DOWNLOAD_SLOTS <scrapy:DOWNLOAD_SLOTS>` setting.A new :setting:`ZYTE_API_PRESERVE_DELAY` setting allows overriding this behavior, i.e. enabling delay resetting even if :setting:`AUTOTHROTTLE_ENABLED <scrapy:AUTOTHROTTLE_ENABLED>` is
Falseor disabling delay resetting even if :setting:`AUTOTHROTTLE_ENABLED <scrapy:AUTOTHROTTLE_ENABLED>` isTrue.The :reqmeta:`zyte_api_session_location` and :reqmeta:`zyte_api_session_params` request metadata keys, if present in a request that triggers a session initialization request, will be copied into the session initialization request, so that they are available when :setting:`ZYTE_API_SESSION_CHECKER` or :meth:`SessionConfig.check <scrapy_zyte_api.SessionConfig.check>` are called for a session initialization request.
The new :meth:`SessionConfig.enabled <scrapy_zyte_api.SessionConfig.enabled>` method allows configuring whether session management should be enabled or disabled for any given request.
A new stat,
scrapy-zyte-api/sessions/use/disabled, indicates the number of requests for which session management was disabled.
- Implemented a :ref:`session management API <session>`.
- The recommended position for
ScrapyZyteAPIDownloaderMiddlewarechanged from 1000 to 633, to accommodate for the newScrapyZyteAPISessionDownloaderMiddleware, which needs to be afterScrapyZyteAPIDownloaderMiddlewareand before the Scrapy cookie downloader middleware (700).
- Now the :setting:`ZYTE_API_PROVIDER_PARAMS` setting and the :reqmeta:`zyte_api_provider` request metadata key can influence the resolution of an :class:`~web_poet.page_inputs.response.AnyResponse` dependency.
- The log messages from the download handler that indicate the source request
URL of an exception have switched from
ERRORlog level toDEBUG. The exceptions themselves that follow those messages will still be logged as errors unless you handle them.
- The
Accept,Accept-Encoding,Accept-Language, andUser-Agentheaders are now dropped automatically during :ref:`header mapping <header-mapping>` unless they have user-defined values. This fix can improve success rates on some websites when using :ref:`HTTP requests <zapi-http>`.
extractFromin :reqmeta:`zyte_api_provider` or :setting:`ZYTE_API_PROVIDER_PARAMS` overrides :class:`~scrapy_zyte_api.ExtractFrom` annotations.
- Updated requirement versions:
- A new :reqmeta:`zyte_api_provider` request metadata key offers the same functionality as the :setting:`ZYTE_API_PROVIDER_PARAMS` setting on a per-request basis.
- Fixed support for nested dicts, tuples and lists when defining :ref:`browser actions <browser-actions>`.
- :class:`scrapy_zyte_api.Addon` now adds
:class:`scrapy_zyte_api.providers.ZyteApiProvider` to the
SCRAPY_POET_PROVIDERS:ref:`scrapy-poet setting <scrapy-poet:settings>` if :doc:`scrapy-poet <scrapy-poet:index>` is installed.
- Added a :class:`scrapy_zyte_api.Actions` dependency.
- Added a :class:`scrapy_zyte_api.Screenshot` dependency.
- Added support for Python 3.12.
- Updated requirement versions:
- :doc:`scrapy-poet <scrapy-poet:index>` >= 0.22.0
- :doc:`web-poet <web-poet:index>` >= 0.17.0
- Added a Scrapy add-on, :class:`scrapy_zyte_api.Addon`, which simplifies
configuring Scrapy projects to work with
scrapy-zyte-api. - CI improvements.
- Fix
"extractFrom": "httpResponseBody"causing both :http:`request:customHttpRequestHeaders` and :http:`request:requestHeaders`, which are incompatible with each other, to be set when using automatic request mapping.
- Removed support for Python 3.7.
- Updated requirement versions:
- :doc:`scrapy-poet <scrapy-poet:index>` >= 0.21.0
- :doc:`web-poet <web-poet:index>` >= 0.16.0
- Added support for :class:`web_poet.AnyResponse <web_poet.page_inputs.response.AnyResponse>` dependency.
- Added support to specify the country code via :class:`typing.Annotated` and :class:`scrapy_zyte_api.Geolocation` dependency (supported only on Python 3.9+).
- Improved tests.
Updated requirement versions:
- :doc:`scrapy-poet <scrapy-poet:index>` >= 0.20.1
Dependency injection :ref:`through scrapy-poet <scrapy-poet>` is now taken into account for request fingerprinting.
Now, when scrapy-poet is installed, the default value of the :setting:`ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS` setting is :class:`scrapy_poet.ScrapyPoetRequestFingerprinter`, and a warning will be issued if a custom value is not a subclass of :class:`~scrapy_poet.ScrapyPoetRequestFingerprinter`.
:ref:`Zyte Smart Proxy Manager special headers <spm-request-headers>` will now be dropped automatically when using :ref:`transparent mode <transparent>` or :ref:`automatic request parameters <automap>`. Where possible, they will be replaced with equivalent Zyte API parameters. In all cases, a warning will be issued.
Covered the configuration of :class:`scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware` in the :ref:`setup documentation <setup>`.
:class:`~scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware` was added in scrapy-zyte-api 0.13.0, and is required to automatically close spiders when all start requests fail because they are pointing to domains forbidden by Zyte API.
The assignment of a custom download slot to requests that use Zyte API now also happens in the spider middleware, not only in the downloader middleware.
This way requests get a download slot assigned before they reach the scheduler, making Zyte API requests work as expected with :class:`scrapy.pqueues.DownloaderAwarePriorityQueue`.
Note
New requests created from downloader middlewares do not get their download slot assigned before they reach the scheduler. So, unless they reuse the metadata from a requests that did get a download slot assigned (e.g. retries, redirects), they will continue not to work as expected with :class:`~scrapy.pqueues.DownloaderAwarePriorityQueue`.
- Updated requirement versions:
- andi >= 0.6.0
- scrapy-poet >= 0.19.0
- zyte-common-items >= 0.8.0
- Added support for
zyte_common_items.JobPostingto the scrapy-poet provider.
- Updated requirement versions:
- andi >= 0.5.0
- scrapy-poet >= 0.18.0
- web-poet >= 0.15.1
- zyte-api >= 0.4.8
- The spider is now closed and the finish reason is set to
"zyte_api_bad_key"or"zyte_api_suspended_account"when receiving "Authentication Key Not Found" or "Account Suspended" responses from Zyte API. - The spider is now closed and the finish reason is set to
"failed_forbidden_domain"when all start requests fail because they are pointing to domains forbidden by Zyte API. - The spider is now closed and the finish reason is set to
"plugin_conflict"if both scrapy-zyte-smartproxy and the transparent mode of scrapy-zyte-api are enabled. - The
extractFromextraction option can now be requested by annotating the dependency with ascrapy_zyte_api.ExtractFrommember (e.g.product: typing.Annotated[Product, ExtractFrom.httpResponseBody]). - The
Set-Cookieheader is now removed from the response if the cookies were returned by Zyte API (as"experimental.responseCookies"). - The request fingerprinting was improved by refining which parts of the request affect the fingerprint.
- Zyte API Request IDs are now included in the error logs.
- Split README.rst into multiple documentation files and publish them on ReadTheDocs.
- Improve the documentation for the
ZYTE_API_MAX_REQUESTSsetting. - Test and CI improvements.
- Unused
<data type>Options(e.g.productOptions) are now dropped fromZYTE_API_PROVIDER_PARAMSwhen sending the Zyte API request - When logging Zyte API requests, truncation now uses "..." instead of Unicode ellipsis.
The new
_ZYTE_API_USER_AGENTsetting allows customizing the user agent string reported to Zyte API.Note that this setting is only meant for libraries and frameworks built on top of scrapy-zyte-api, to report themselves to Zyte API, for client software tracking and monitoring purposes. The value of this setting is not the
User-Agentheader sent to upstream websites when using Zyte API.
A new
ZYTE_API_PROVIDER_PARAMSsetting allows setting Zyte API parameters, likegeolocation, to be included in all Zyte API requests by the scrapy-poet provider.A new
scrapy-zyte-api/request_args/<parameter>stat, counts the number of requests containing a given Zyte API request parameter. For example,scrapy-zyte-api/request_args/urlcounts the number of Zyte API requests with the URL parameter set (which should be all of them).Experimental is treated as a namespace, and its parameters are the ones counted, i.e. there is no
scrapy-zyte-api/request_args/experimentalstat, but there are stats likescrapy-zyte-api/request_args/experimental.responseCookies.
- scrapy-zyte-api 0.11.0 accidentally increased the minimum required version of scrapy-poet from 0.10.0 to 0.11.0. We have reverted that change and implemented measures to prevent similar accidents in the future.
- Automatic parameter mapping no longer warns about dropping the
Accept-Encodingheader when the header value matches the Scrapy default. - The README now mentions additional changes that may be necessary when switching Twisted reactors on existing projects.
- The README now explains how status codes, from Zyte API or from wrapped responses, are reflected in Scrapy stats.
- Added a
ZYTE_API_MAX_REQUESTSsetting to limit the number of successful Zyte API requests that a spider can send. Reaching the limit stops the spider. - Setting
requestCookiesto[]in thezyte_api_automaprequest metadata field now triggers a warning.
- Added more data types to the scrapy-poet provider:
zyte_common_items.ProductListzyte_common_items.ProductNavigationzyte_common_items.Articlezyte_common_items.ArticleListzyte_common_items.ArticleNavigation
- Moved the new dependencies added in 0.9.0 and needed only for the scrapy-poet
provider (
scrapy-poet,web-poet,zyte-common-items) into the new optional feature[provider]. - Improved result caching in the scrapy-poet provider.
- Added a new setting,
ZYTE_API_USE_ENV_PROXY, which can be set toTrueto access Zyte API using a proxy configured in the local environment. - Fixed getting the Scrapy Cloud job ID.
- Improved the documentation.
- Improved the CI configuration.
- New and updated requirements:
- packaging >= 20.0
- scrapy-poet >= 0.9.0
- web-poet >= 0.13.0
- zyte-common-items
- Added a scrapy-poet provider for Zyte API. Currently supported data types:
web_poet.BrowserHtmlweb_poet.BrowserResponsezyte_common_items.Product
- Added a
zyte_api_default_paramsrequest meta key which allows users to ignore theZYTE_API_DEFAULT_PARAMSsetting for individual requests. - CI fixes.
- Fixed an exception raised by the downloader middleware when cookies were enabled.
- Made Python 3.11 support official.
- Added support for the upcoming automatic extraction feature of Zyte API.
- Included a descriptive message in the exception that triggers when the download handler cannot be initialized.
- Clarified that
LOG_LEVELmust beDEBUGforZYTE_API_LOG_REQUESTSmessages to be visible.
- Fixed the handling of response cookies without a domain.
- CI fixes
- Fixed an
AssertionErrorwhen cookies are disabled. - Added links to the README to improve navigation from GitHub.
- Added a license file (BSD-3-Clause).
Added experimental cookie support:
- The
experimental.responseCookiesresponse parameter is now mapped to the response headers asSet-Cookieheaders, as well as added to the cookiejar of the request. - A new boolean setting,
ZYTE_API_EXPERIMENTAL_COOKIES_ENABLED, can be set toTrueto enable automatic mapping of cookies from a request cookiejar into theexperimental.requestCookiesZyte API parameter.
- The
ZyteAPITextResponseis now a subclass ofHtmlResponse, so that theopen_in_browserfunction of Scrapy uses the.htmlextension for Zyte API responses.While not ideal, this is much better than the previous behavior, where the
.htmlextension was never used for Zyte API responses.ScrapyZyteAPIDownloaderMiddlewarenow also supports non-string slot IDs.
- It is now possible to log the parameters of requests sent.
- Stats for HTTP and HTTPS traffic used to be kept separate, and only one of those sets of stats would be reported. This is fixed now.
- Fixed some code examples and references in the README.
When upgrading, you should set the following in your Scrapy settings:
DOWNLOADER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 633,
}
# only applicable for Scrapy 2.7+
REQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter"Fixes the issue where scrapy-zyte-api is slow when Scrapy Cloud has Autothrottle Addon enabled. The new
ScrapyZyteAPIDownloaderMiddlewarefixes this.It now supports Scrapy 2.7's new
REQUEST_FINGERPRINTER_CLASSwhich ensures that Zyte API requests are properly fingerprinted. This addresses the issue where Scrapy marks POST requests as duplicate if they point to the same URL despite having different request bodies. As a workaround, users were marking their requests withdont_filter=Trueto prevent such dupe filtering.For users having
scrapy >= 2.7, you can simply update your Scrapy settings to haveREQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter".If your Scrapy project performs other requests aside from Zyte API, you can set
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS = "custom.RequestFingerprinter"to allow custom fingerprinting. By default, the default Scrapy request fingerprinter is used for non-Zyte API requests.For users having
scrapy < 2.7, check the following link to see different ways on handling the duplicate request issue: https://github.com/scrapy-plugins/scrapy-zyte-api#request-fingerprinting-before-scrapy-27.More information about the request fingerprinting topic can be found in https://github.com/scrapy-plugins/scrapy-zyte-api#request-fingerprinting.
Various improvements to docs and tests.
- Add a
ZYTE_API_TRANSPARENT_MODEsetting,Falseby default, which can be set toTrueto make all requests use Zyte API by default, with request parameters being automatically mapped to Zyte API parameters. - Add a Request meta key,
zyte_api_automap, that can be used to enable automatic request parameter mapping for specific requests, or to modify the outcome of automatic request parameter mapping for specific requests. - Add a
ZYTE_API_AUTOMAP_PARAMSsetting, which is a counterpart forZYTE_API_DEFAULT_PARAMSthat applies to requests where automatic request parameter mapping is enabled. - Add the
ZYTE_API_SKIP_HEADERSandZYTE_API_BROWSER_HEADERSsettings to control the automatic mapping of request headers. - Add a
ZYTE_API_ENABLEDsetting,Trueby default, which can be used to disable this plugin. - Document how Zyte API responses are mapped to Scrapy response subclasses.
- Raise the minimum dependency of Zyte API's Python API to
zyte-api>=0.4.0. This changes all the requests to Zyte API to have haveAccept-Encoding: brand automatically decompress brotli responses. - Rename "Zyte Data API" to simply "Zyte API" in the README.
- Lower the minimum Scrapy version from
2.6.0to2.0.1.
- Zyte Data API error responses (after retries) are no longer ignored, and
instead raise a
zyte_api.aio.errors.RequestErrorexception, which allows user-side handling of errors and provides better feedback for debugging. - Allowed retry policies to be specified as import path strings, which is
required for the
ZYTE_API_RETRY_POLICYsetting, and allows requests with thezyte_api_retry_policyrequest.meta key to remain serializable. - Fixed the naming of stats for some error types.
- Updated the output examples on the README.
- Cleaned up Scrapy stats names: fixed an issue with
//, renamedscrapy-zyte-api/api_error_types/..toscrapy-zyte-api/error_types/.., addedscrapy-zyte-api/error_types/<empty>for cases error type is unknown; - Added error type to the error log messages
- Testing improvements
Fixed incorrect 0.4.0 release.
- Requires a more recent Python client library zyte-api ≥ 0.3.0.
- Stats from zyte-api are now copied into Scrapy stats. The
scrapy-zyte-api/request_countstat has been renamed toscrapy-zyte-api/processedaccordingly.
CONCURRENT_REQUESTSScrapy setting is properly supported; in previous releases max concurrency of Zyte API requests was limited to 15.- The retry policy for Zyte API requests can be overridden, using
either
ZYTE_API_RETRY_POLICYsetting orzyte_api_retry_policyrequest.meta key. - Proper response.status is set when Zyte API returns
statusCodefield. - URL of the Zyte API server can be set using
ZYTE_API_URLScrapy setting. This feature is currently used in tests. - The minimum required Scrapy version (2.6.0) is now enforced in setup.py.
- Test and documentation improvements.
Remove the
Content-Decodingheader when returning the responses. This prevents Scrapy from decompressing already decompressed contents done by Zyte Data API. Otherwise, this leads to errors inside Scrapy'sHttpCompressionMiddleware.Introduce
ZyteAPIResponseandZyteAPITextResponsewhich are subclasses ofscrapy.http.Responseandscrapy.http.TextResponserespectively. These new response classes hold the raw Zyte Data API response in theraw_api_responseattribute.Introduce a new setting named
ZYTE_API_DEFAULT_PARAMS.- At the moment, this only applies to Zyte API enabled
scrapy.Request(which is declared by having thezyte_apiparameter in the Request meta having valid parameters, set toTrue, or{}).
- At the moment, this only applies to Zyte API enabled
Specify in the README to set
dont_filter=Truewhen using the same URL but with differentzyte_apiparameters in the Request meta. This is a current workaround since Scrapy will tag them as duplicate requests and will result in duplication filtering.Various documentation improvements.
- Initial release