-
Notifications
You must be signed in to change notification settings - Fork 3
Description
In many Scrapy projects, it’s common to define spider parameters that should fall back to a value from the Scrapy settings if not explicitly provided. Currently, doing this cleanly is a bit cumbersome and can lead to misusing settings or ignoring the intended settings priority hierarchy.
Current Workaround
A naive (but flawed) way to assign a setting as the default value looks like this:
from project import settings
class MyParams(BaseModel):
pages: int = settings.PAGES # ❌ This bypasses Scrapy’s priority systemThis approach directly accesses the settings object at import time, which ignores Scrapy’s settings resolution and priority system and can even fail if settings are not defined at import time.
A more correct (but verbose) alternative is to override from_crawler or __init__ in the spider to manually inject the settings value:`
class MySpider(scrapy.Spider):
custom_settings = {"MAX_PAGES_SETTING": 10}
def __init__(self, pages=None, *args, **kwargs):
super().__init__(*args, **kwargs)
if pages is None:
pages = self.crawler.settings.getint("MAX_PAGES_SETTING", 5)
self.pages = pagesWhile functional, this approach requires writing repetitive boilerplate code in every spider.
Proposal: Add Helper for Default Values from Scrapy Settings
It would be great to introduce a helper or syntactic sugar that allows cleanly declaring parameters with defaults coming from Scrapy settings, for example:
class MyParams(BaseModel):
pages: int = FROM_SCRAPY_SETTING("MAX_PAGES_SETTING", default=5)This helper could internally resolve the setting during the spider’s initialization phase, preserving proper settings priority and avoiding unnecessary boilerplate.
The result would be behaviorally equivalent to:
class MySpider(scrapy.Spider):
def __init__(self, pages=None, *args, **kwargs):
super().__init__(*args, **kwargs)
if pages is None:
pages = self.crawler.settings.getint("MAX_PAGES_SETTING", 5)
self.pages = pages