Skip to content

KeyError with the initialization of an Item Field defined with None using ItemLoader #33

@Kiizuna067

Description

@Kiizuna067

Using Scrapy 1.5.0
I took a look at the FAQ section and nothing was relevant about it.
Same for issues with keyword KeyError on github, Reddit, or GoogleGroups.

As you can see below, it seems to me that here is an inconsistency when we load an Item or initialize it with a values as None or an empty string. First we add a value to our field (here title) through a ItemLoader. Then the loader creates an item with the load_item() method. Once it's done we can't access the field if the value was None or an empty string. The inconsistency is that the other method (initializing an Item directly with a field set to None or empty string) doesn't raise a KeyError (field is set).

The class TakeFirst don't return any value when they're set with None or an empty string. Which prevents the method load_item() in ItemLoader class to add an entry to the field.

Here is a minimal source code that represents the inconsistency.

import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, MapCompose

class MyItem(scrapy.Item):
    title = scrapy.Field()

class MyItemLoader(ItemLoader):
    default_output_processor = TakeFirst()
    title_in = MapCompose()

class MySpider(scrapy.Spider):
    name = "My"
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        titles = ['fake title', '', None]

        for title in titles:
            # First case: with ItemLoader
            loader = MyItemLoader(item=MyItem())
            loader.add_value('title', title)
            loaded_item = loader.load_item()

            # Second case: without ItemLoader
            item = MyItem(title=title)

            if title in ('', None):
                # inconsistency!
                assert not 'title' in loaded_item
                assert 'title' in item

We're using Python 3.5 for our project and found the following workaround to prevent this error.
We introduce a new class (DefaultAwareItem) which fulfills unset fields were default metadata has been set previously.

import scrapy

class DefaultAwareItem(scrapy.Item):
    """Item class aware of 'default' metadata of its fields.

    For instance to work, each field, which must have a default value, must
    have a new `default` parameter set in field constructor, e.g.::

        class MyItem(DefaultAwareItem):
            my_defaulted_field = scrapy.Field()
            # Identical to:
            #my_defaulted_field = scrapy.Field(default=None)
            my_other_defaulted_field = scrapy.Field(default='a value')

    """
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        for field_name, field_metadata in self.fields.items():
            self.setdefault(field_name, field_metadata.get('default'))

class MyItem(DefaultAwareItem):
    title = scrapy.Field()
    title_explicitely_set = scrapy.Field(default="empty title")

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions