Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
408678e
WIP
Gallaecio Aug 4, 2025
8d676b8
Merge remote-tracking branch 'origin/main' into session-delays
AdrianAtZyte Feb 5, 2026
02c7adf
Change session terminology and add a session troubleshooting section
AdrianAtZyte Feb 5, 2026
90cd72c
Further documentation updates
AdrianAtZyte Feb 5, 2026
af5eb24
Complete the implementation
AdrianAtZyte Feb 6, 2026
9c2afc9
Split test_sessions.py into more manageable files
AdrianAtZyte Feb 6, 2026
05a2635
Use ZYTE_API_SESSION_DELAY=0 by default for tests
AdrianAtZyte Feb 6, 2026
a25acb9
Update the documentation of the zyte_api_session_pool request metadat…
AdrianAtZyte Feb 6, 2026
a3eb5e1
Update the ZYTE_API_SESSION_DELAY docs
AdrianAtZyte Feb 6, 2026
16e92f9
Add test_delay_default
AdrianAtZyte Feb 6, 2026
e8e66ab
Extend delay tests
AdrianAtZyte Feb 6, 2026
c2db031
Refactor delay and size tests
AdrianAtZyte Feb 6, 2026
6aeeb50
Improve test coverage
AdrianAtZyte Feb 6, 2026
76b2127
Test new PoolError scenarios
AdrianAtZyte Feb 6, 2026
8395a62
Make ZYTE_API_SESSION_DELAY default to DOWNLOAD_DELAY
AdrianAtZyte Feb 6, 2026
c01f55b
Implement ZYTE_API_SESSION_RANDOMIZE_DELAY
AdrianAtZyte Feb 6, 2026
667d956
Consolidate similar tests
AdrianAtZyte Feb 6, 2026
da2a716
Simplify session stat checking in tests
AdrianAtZyte Feb 6, 2026
77f7d7d
Improve test coverage
AdrianAtZyte Feb 6, 2026
a46b055
Complete coverage
AdrianAtZyte Feb 6, 2026
aa41ef3
Solve typing issues
AdrianAtZyte Feb 6, 2026
a4237d1
Fix test_missing_session_id_on_response
AdrianAtZyte Feb 6, 2026
157a27e
x402<2.0.0
AdrianAtZyte Feb 6, 2026
60ab006
Restore old Scrapy support
AdrianAtZyte Feb 6, 2026
257f35d
test_max: see if a non-zero queue time lowers the likelihood of a rac…
AdrianAtZyte Feb 6, 2026
10d0fdd
test_empty_queue: see if a non-zero queue time lowers the likelihood …
AdrianAtZyte Feb 6, 2026
bebde01
Remove typing_extensions freezing from min-x402
AdrianAtZyte Feb 6, 2026
dfa73b4
Define PoolConfig.randomize_delay
AdrianAtZyte Feb 6, 2026
178360f
Address feedback
AdrianAtZyte Feb 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 41 additions & 4 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,43 @@
Changes
=======

0.33.0 (unreleased)
-------------------

- Added a minimum delay between reuses of any given :ref:`plugin-managed
session <session>`.

It is :setting:`DOWNLOAD_DELAY` by default. Use
:setting:`ZYTE_API_SESSION_DELAY` to change that or
:setting:`ZYTE_API_SESSION_POOLS` to override it for specific
:setting:`session pools <session-pools>`.

:setting:`ZYTE_API_SESSION_RANDOMIZE_DELAY` controls whether that minimum
delay is randomized by multiplying it by a random factor between 0.5 and
1.5. It defaults to :setting:`RANDOMIZE_DOWNLOAD_DELAY`.

- The value of the :reqmeta:`zyte_api_session_pool` request metadata key and
the return value of the :meth:`SessionConfig.pool()
<scrapy_zyte_api.SessionConfig.pool>` method can now be a dictionary
instead of a string, allowing to override :setting:`ZYTE_API_SESSION_DELAY`
and :setting:`ZYTE_API_SESSION_POOL_SIZE` for the corresponding pool.

However, they cannot override values defined in
:setting:`ZYTE_API_SESSION_POOLS`.

- Deprecated the ``ZYTE_API_SESSION_POOL_SIZES`` setting in favor of the new
:setting:`ZYTE_API_SESSION_POOLS` setting, where you can set ``"size"``.

- Changed the terminology around :ref:`session management <session>` to try
to make it clearer and more consistent:

| client-managed sessions → user-managed sessions
| server-managed sessions → Zyte-managed sessions
| scrapy-zyte-api session management → plugin-managed sessions

- Added a :ref:`session-troubleshooting` section to the :ref:`session` page.


0.32.0 (2026-01-20)
-------------------

Expand Down Expand Up @@ -92,10 +129,10 @@ Changes
:http:`request:httpResponseHeaders` will no longer be enabled by default,
and :ref:`request header mapping <request-header-mapping>` is disabled.

* Session pool IDs, of server-managed sessions (:http:`request:sessionContext`)
or :ref:`set through the session management API <session-pools>`, now affect
request fingerprinting: 2 requests identical except for their session pool ID
are *not* considered duplicate requests any longer.
* Session pool IDs, of Zyte-managed sessions (:http:`request:sessionContext`)
or :ref:`plugin-managed sessions <session-pools>`, now affect request
fingerprinting: 2 requests identical except for their session pool ID are
*not* considered duplicate requests any longer.

* When it is not clear whether a request will use browser rendering or not,
e.g. an :ref:`automatic extraction request <zapi-extract>` without an
Expand Down
10 changes: 5 additions & 5 deletions docs/reference/meta.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,8 @@ zyte_api_session_enabled

Default: :setting:`ZYTE_API_SESSION_ENABLED`

Whether to use :ref:`scrapy-zyte-api session management <session>` for the
request (``True``) or not (``False``).
Whether to send the request with a :ref:`plugin-managed session <session>`
(``True``) or not (``False``).

.. seealso:: :meth:`scrapy_zyte_api.SessionConfig.enabled`

Expand Down Expand Up @@ -141,7 +141,7 @@ zyte_api_session_pool

Default: ``""``

Determines the ID of the session pool to assign to the request, overriding the
:ref:`default pool assignment logic <session-pools>`.
If not falsy, it determines the default pool ID and options for the request.

.. seealso:: :meth:`scrapy_zyte_api.SessionConfig.pool`
It supports the same values as the return value of
:meth:`scrapy_zyte_api.SessionConfig.pool`.
54 changes: 48 additions & 6 deletions docs/reference/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -407,14 +407,35 @@ object, for example to read settings:
ZYTE_API_SESSION_CHECKER = MySessionChecker


.. setting:: ZYTE_API_SESSION_DELAY

ZYTE_API_SESSION_DELAY
======================

Default: :setting:`DOWNLOAD_DELAY`

Minimum number of seconds to wait before reusing a :ref:`plugin-managed
session <session>`.

To override this value for specific pools, use the ``"delay"`` key in a
:class:`dict` value of the :setting:`ZYTE_API_SESSION_POOLS` setting, of the
:reqmeta:`zyte_api_session_pool` request metadata key, or that returned by
:meth:`~scrapy_zyte_api.SessionConfig.pool`.

Increasing this number can reduce the number of ban-related session
expirations, hence increasing the lifetime of each session. See
:ref:`optimize-sessions`.

.. seealso:: :setting:`ZYTE_API_SESSION_RANDOMIZE_DELAY`

.. setting:: ZYTE_API_SESSION_ENABLED

ZYTE_API_SESSION_ENABLED
========================

Default: ``False``

Enables :ref:`scrapy-zyte-api session management <session>`.
Enables :ref:`plugin-managed sessions <session>`.


.. setting:: ZYTE_API_SESSION_LOCATION
Expand Down Expand Up @@ -535,22 +556,34 @@ The maximum number of active :ref:`scrapy-zyte-api sessions <session>` to keep
per :ref:`pool <session-pools>`.

To override this value for specific pools, use
:setting:`ZYTE_API_SESSION_POOL_SIZES`.
:setting:`ZYTE_API_SESSION_POOLS` or return a dictionary from
:meth:`~scrapy_zyte_api.SessionConfig.pool` containing a ``"size"`` key.

Increase this number to lower the frequency with which requests are sent
through each session, which on some websites may increase the lifetime of each
session. See :ref:`optimize-sessions`.


.. setting:: ZYTE_API_SESSION_POOL_SIZES
.. setting:: ZYTE_API_SESSION_POOLS

ZYTE_API_SESSION_POOL_SIZES
===========================
ZYTE_API_SESSION_POOLS
======================

Default: ``{}``

:class:`dict` where keys are :ref:`pool <session-pools>` IDs and values are
overrides of :setting:`ZYTE_API_SESSION_POOL_SIZE` for those pools.
dicts with any combination of the following keys that override the
corresponding setting for that pool:

- ``"delay"`` overrides :setting:`ZYTE_API_SESSION_DELAY`.

- ``"randomize_delay"`` overrides
:setting:`ZYTE_API_SESSION_RANDOMIZE_DELAY`.

- ``"size"`` overrides :setting:`ZYTE_API_SESSION_POOL_SIZE`.

These overrides take precedence over :attr:`SessionConfig.pool
<scrapy_zyte_api.SessionConfig.pool>`.


.. setting:: ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS
Expand Down Expand Up @@ -586,6 +619,15 @@ queue.

See :setting:`ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS` for details.

.. setting:: ZYTE_API_SESSION_RANDOMIZE_DELAY

ZYTE_API_SESSION_RANDOMIZE_DELAY
================================

Default: :setting:`RANDOMIZE_DOWNLOAD_DELAY`

If enabled, :setting:`ZYTE_API_SESSION_DELAY` is randomized each time it is
used by multiplying it by a random factor between 0.5 and 1.5.

.. setting:: ZYTE_API_SKIP_HEADERS

Expand Down
109 changes: 66 additions & 43 deletions docs/usage/session.rst
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
.. _session:

==================
Session management
==================
=======================
Plugin-managed sessions
=======================

Zyte API provides powerful session APIs:

- :ref:`Client-managed sessions <zapi-session-id>` give you full control
over session management.
- :ref:`User-managed sessions <zapi-session-id>` give you full control over
session management.

- :ref:`Server-managed sessions <zapi-session-contexts>` let Zyte API
handle session management for you.
- :ref:`Zyte-managed sessions <zapi-session-contexts>` let Zyte API handle
session management for you.

When using scrapy-zyte-api, you can use these session APIs through the
corresponding Zyte API fields (:http:`request:session`,
:http:`request:sessionContext`).

However, scrapy-zyte-api also provides its own session management API, similar
to that of :ref:`server-managed sessions <zapi-session-contexts>`, but
built on top of :ref:`client-managed sessions <zapi-session-id>`.
However, scrapy-zyte-api also provides plugin-managed sessions, with an API
similar to that of Zyte-managed sessions, but built on top of user-managed
sessions.

scrapy-zyte-api session management offers some advantages over
:ref:`server-managed sessions <zapi-session-contexts>`:
Plugin-managed sessions offer some advantages over :ref:`Zyte-managed sessions
<zapi-session-contexts>`:

- You can perform :ref:`session validity checks <session-check>`, so that the
sessions of responses that do not pass those checks are refreshed, and the
Expand All @@ -34,24 +34,24 @@ scrapy-zyte-api session management offers some advantages over
- You have granular control over the session pool size, max errors, etc. See
:ref:`optimize-sessions` and :ref:`session-configs`.

However, scrapy-zyte-api session management is not a replacement for
:ref:`server-managed sessions <zapi-session-contexts>` or
:ref:`client-managed sessions <zapi-session-id>`:
However, plugin-managed sessions are not a replacement for :ref:`Zyte-managed
sessions <zapi-session-contexts>` or :ref:`user-managed sessions
<zapi-session-id>`:

- :ref:`Server-managed sessions <zapi-session-contexts>` offer a longer
life time than the :ref:`client-managed sessions <zapi-session-id>`
that scrapy-zyte-api session management uses, so as long as you do not need
one of the scrapy-zyte-api session management features, server-managed
sessions can be significantly more efficient (fewer total sessions needed
- :ref:`Zyte-managed sessions <zapi-session-contexts>` offer a longer life
time than the :ref:`user-managed sessions <zapi-session-id>` that
plugin-managed sessions use, so as long as you do not need one of the
features of plugin-managed sessions, Zyte-managed sessions can be
significantly more efficient (fewer session-initialization requests needed
per crawl).

Zyte API can also optimize server-managed sessions based on the target
website. With scrapy-zyte-api session management, you need to :ref:`handle
Zyte API can also optimize Zyte-managed sessions based on the target
website. With plugin-managed sessions, you need to :ref:`handle
optimization yourself <optimize-sessions>`.

- :ref:`Client-managed sessions <zapi-session-id>` offer full control
over session management, while scrapy-zyte-api session management removes
some of that control to provide an easier API for supported use cases.
- :ref:`User-managed sessions <zapi-session-id>` offer full control over
session management, while plugin-managed sessions remove some of that
control to provide an easier API for supported use cases.

.. _enable-sessions:

Expand Down Expand Up @@ -134,7 +134,7 @@ To change the :ref:`default session initialization parameters
:reqmeta:`zyte_api_session_params` request metadata key.

It works similarly to :http:`request:sessionContextParams` from
:ref:`server-managed sessions <zapi-session-contexts>`, but it supports
:ref:`Zyte-managed sessions <zapi-session-contexts>`, but it supports
arbitrary Zyte API parameters instead of a specific subset.

If it does not define a ``"url"``, the URL of the request :ref:`triggering
Expand Down Expand Up @@ -247,7 +247,7 @@ overrides <session-configs>`.

The :setting:`ZYTE_API_SESSION_POOL_SIZE` setting determines the desired number
of concurrent, active, working sessions per pool. The
:setting:`ZYTE_API_SESSION_POOL_SIZES` setting allows defining different values
:setting:`ZYTE_API_SESSION_POOLS` setting allows defining different values
for specific pools.

.. _pool-size:
Expand All @@ -274,7 +274,6 @@ The session pool assigned to a request affects the :ref:`fingerprint
considered different requests, i.e. not duplicate requests, even if they are
otherwise identical.


.. _optimize-sessions:

Optimizing sessions
Expand All @@ -290,17 +289,17 @@ Here are some things you can try:

- On some websites, sending too many requests too fast through a session can
cause the target website to ban that session.

On those websites, you can increase the number of sessions in the pool
(:setting:`ZYTE_API_SESSION_POOL_SIZE`). The more different sessions you
use, the more slowly you send requests through each session.

Mind, however, that :ref:`client-managed sessions <zapi-session-id>`
expire after `15 minutes since creation or 2 minutes since the last request
<https://docs.zyte.com/zyte-api/usage/reference.html#operation/extract/request/session>`_.
At a certain point, increasing :setting:`ZYTE_API_SESSION_POOL_SIZE`
without increasing :setting:`CONCURRENT_REQUESTS
<scrapy:CONCURRENT_REQUESTS>` and :setting:`CONCURRENT_REQUESTS_PER_DOMAIN
On those websites, you can increase :setting:`ZYTE_API_SESSION_DELAY`,
:setting:`ZYTE_API_SESSION_POOL_SIZE`, or both, to lower the rate of
session reuse.

Mind, however, that :ref:`user-managed sessions <zapi-session-id>` expire
after 15 minutes since creation or 2 minutes since the last request (see
:http:`request:session`). At a certain point, increasing
:setting:`ZYTE_API_SESSION_POOL_SIZE` without increasing
:setting:`CONCURRENT_REQUESTS <scrapy:CONCURRENT_REQUESTS>` and
:setting:`CONCURRENT_REQUESTS_PER_DOMAIN
<scrapy:CONCURRENT_REQUESTS_PER_DOMAIN>` accordingly can be
counterproductive.

Expand All @@ -317,10 +316,9 @@ Here are some things you can try:

If you do not need :ref:`session checking <session-check>` and your
:ref:`initialization parameters <session-init>` are only
:http:`request:browserHtml` and :http:`request:actions`, :ref:`server-managed
:http:`request:browserHtml` and :http:`request:actions`, :ref:`Zyte-managed
sessions <zapi-session-contexts>` might be a more cost-effective choice, as
they live much longer than :ref:`client-managed sessions
<zapi-session-id>`.
they live much longer than :ref:`user-managed sessions <zapi-session-id>`.


.. _session-configs:
Expand Down Expand Up @@ -445,7 +443,7 @@ implementation may also close your spider with a custom reason by raising a
Session stats
=============

The following stats exist for scrapy-zyte-api session management:
The following stats exist for plugin-managed sessions:

``scrapy-zyte-api/sessions/pools/{pool}/init/check-error``
Number of times that a session for pool ``{pool}`` triggered an unexpected
Expand Down Expand Up @@ -501,3 +499,28 @@ The following stats exist for scrapy-zyte-api session management:

``scrapy-zyte-api/sessions/use/disabled``
Number of processed requests for which session management was disabled.

.. _session-troubleshooting:

Troubleshooting
===============

.. _session-troubleshooting-could-not-get-session-id:

RuntimeError: Could not get a session ID
----------------------------------------

If you see this exception, indicating that after a given number of attempts,
with a given minimum wait time between attempts, it was not possible to get a
session ID from the session rotation queue, consider the following
possibilities:

- A bug in your session validation code may be causing it to return ``False``
for a valid response.

This is specially likely if you see this issue for very few, specific
requests, while most requests work fine.

- The values of the :setting:`ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS` and
:setting:`ZYTE_API_SESSION_QUEUE_WAIT_TIME` settings may be too low for
your scenario, in which case you can modify them accordingly.
6 changes: 4 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,16 @@ classifiers = [
"Programming Language :: Python :: 3.13",
]
requires-python = ">=3.10"
# Sync with [pinned] @ tox.ini
# Sync with [min] @ tox.ini
dependencies = [
"packaging>=20.0",
"scrapy>=2.0.1",
"typing_extensions>=4.1.0",
"zyte-api>=0.6.0",
]

[project.optional-dependencies]
# Sync with [testenv:pinned-provider] @ tox.ini
# Sync with [testenv:min-provider] @ tox.ini
provider = [
"andi>=0.6.0",
"scrapy-poet>=0.22.3",
Expand All @@ -39,6 +40,7 @@ provider = [
]
x402 = [
"zyte-api[x402]>=0.8.0",
"x402<2.0.0",
]

[project.urls]
Expand Down
Loading
Loading