Skip to content

Feature Request: Using Scrapy Settings as Default Values in Spider Parameters #24

@VMRuiz

Description

@VMRuiz

In many Scrapy projects, it’s common to define spider parameters that should fall back to a value from the Scrapy settings if not explicitly provided. Currently, doing this cleanly is a bit cumbersome and can lead to misusing settings or ignoring the intended settings priority hierarchy.

Current Workaround

A naive (but flawed) way to assign a setting as the default value looks like this:

from project import settings

class MyParams(BaseModel):
    pages: int = settings.PAGES  # ❌ This bypasses Scrapy’s priority system

This approach directly accesses the settings object at import time, which ignores Scrapy’s settings resolution and priority system and can even fail if settings are not defined at import time.

A more correct (but verbose) alternative is to override from_crawler or __init__ in the spider to manually inject the settings value:`

class MySpider(scrapy.Spider):
    custom_settings = {"MAX_PAGES_SETTING": 10}

    def __init__(self, pages=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        if pages is None:
            pages = self.crawler.settings.getint("MAX_PAGES_SETTING", 5)
        self.pages = pages

While functional, this approach requires writing repetitive boilerplate code in every spider.

Proposal: Add Helper for Default Values from Scrapy Settings

It would be great to introduce a helper or syntactic sugar that allows cleanly declaring parameters with defaults coming from Scrapy settings, for example:

class MyParams(BaseModel):
    pages: int = FROM_SCRAPY_SETTING("MAX_PAGES_SETTING", default=5)

This helper could internally resolve the setting during the spider’s initialization phase, preserving proper settings priority and avoiding unnecessary boilerplate.

The result would be behaviorally equivalent to:

class MySpider(scrapy.Spider):
    def __init__(self, pages=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        if pages is None:
            pages = self.crawler.settings.getint("MAX_PAGES_SETTING", 5)
        self.pages = pages

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions