Impact
Since version 1.4.0, Scrapy respects the Referrer-Policy response header to decide whether and how to set a Referer header on follow-up requests.
If the header value looked like a valid Python import path, Scrapy would import the referenced object and call it, assuming it referred to a referrer policy class (for example, scrapy.spidermiddlewares.referer.DefaultReferrerPolicy) and attempting to instantiate it to handle the Referer header.
A malicious site could exploit this by setting Referrer-Policy to a path such as sys.exit, causing Scrapy to import and execute it and potentially terminate the process.
Patches
Upgrade to Scrapy 2.14.2 (or later).
Workarounds
If you cannot upgrade to Scrapy 2.14.2, consider the following mitigations.
- Disable the middleware: If you don't need the
Referer header on follow-up requests, set REFERER_ENABLED to False.
- Set headers manually: If you do need a
Referer, disable the middleware and set the header explicitly on the requests that require it.
- Set
referrer_policy in request metadata: If disabling the middleware is not viable, set the referrer_policy request meta key on all requests to prevent evaluating preceding responses' Referrer-Policy. For example:
Request(
url,
meta={
"referrer_policy": "scrapy.spidermiddlewares.referer.DefaultReferrerPolicy",
},
)
Instead of editing requests individually, you can:
- implement a custom spider middleware that runs before the built-in referrer policy middleware and sets the
referrer_policy meta key; or
- set the meta key in start requests and use the scrapy-sticky-meta-params plugin to propagate it to follow-up requests.
If you want to continue respecting legitimate Referrer-Policy headers while protecting against malicious ones, disable the built-in referrer policy middleware by setting it to None in SPIDER_MIDDLEWARES and replace it with the fixed implementation from Scrapy 2.14.2.
If the Scrapy 2.14.2 implementation is incompatible with your project (for example, because your Scrapy version is older), copy the corresponding middleware from your Scrapy version, apply the same patch, and use that as a replacement.
References
Impact
Since version 1.4.0, Scrapy respects the
Referrer-Policyresponse header to decide whether and how to set aRefererheader on follow-up requests.If the header value looked like a valid Python import path, Scrapy would import the referenced object and call it, assuming it referred to a referrer policy class (for example,
scrapy.spidermiddlewares.referer.DefaultReferrerPolicy) and attempting to instantiate it to handle theRefererheader.A malicious site could exploit this by setting
Referrer-Policyto a path such assys.exit, causing Scrapy to import and execute it and potentially terminate the process.Patches
Upgrade to Scrapy 2.14.2 (or later).
Workarounds
If you cannot upgrade to Scrapy 2.14.2, consider the following mitigations.
Refererheader on follow-up requests, setREFERER_ENABLEDtoFalse.Referer, disable the middleware and set the header explicitly on the requests that require it.referrer_policyin request metadata: If disabling the middleware is not viable, set thereferrer_policyrequest meta key on all requests to prevent evaluating preceding responses'Referrer-Policy. For example:Instead of editing requests individually, you can:
referrer_policymeta key; orIf you want to continue respecting legitimate
Referrer-Policyheaders while protecting against malicious ones, disable the built-in referrer policy middleware by setting it toNoneinSPIDER_MIDDLEWARESand replace it with the fixed implementation from Scrapy 2.14.2.If the Scrapy 2.14.2 implementation is incompatible with your project (for example, because your Scrapy version is older), copy the corresponding middleware from your Scrapy version, apply the same patch, and use that as a replacement.
References