-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Using Scrapy 1.5.0
I took a look at the FAQ section and nothing was relevant about it.
Same for issues with keyword KeyError on github, Reddit, or GoogleGroups.
As you can see below, it seems to me that here is an inconsistency when we load an Item or initialize it with a values as None or an empty string. First we add a value to our field (here title) through a ItemLoader. Then the loader creates an item with the load_item() method. Once it's done we can't access the field if the value was None or an empty string. The inconsistency is that the other method (initializing an Item directly with a field set to None or empty string) doesn't raise a KeyError (field is set).
The class TakeFirst don't return any value when they're set with None or an empty string. Which prevents the method load_item() in ItemLoader class to add an entry to the field.
Here is a minimal source code that represents the inconsistency.
import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, MapCompose
class MyItem(scrapy.Item):
title = scrapy.Field()
class MyItemLoader(ItemLoader):
default_output_processor = TakeFirst()
title_in = MapCompose()
class MySpider(scrapy.Spider):
name = "My"
start_urls = ['https://blog.scrapinghub.com']
def parse(self, response):
titles = ['fake title', '', None]
for title in titles:
# First case: with ItemLoader
loader = MyItemLoader(item=MyItem())
loader.add_value('title', title)
loaded_item = loader.load_item()
# Second case: without ItemLoader
item = MyItem(title=title)
if title in ('', None):
# inconsistency!
assert not 'title' in loaded_item
assert 'title' in itemWe're using Python 3.5 for our project and found the following workaround to prevent this error.
We introduce a new class (DefaultAwareItem) which fulfills unset fields were default metadata has been set previously.
import scrapy
class DefaultAwareItem(scrapy.Item):
"""Item class aware of 'default' metadata of its fields.
For instance to work, each field, which must have a default value, must
have a new `default` parameter set in field constructor, e.g.::
class MyItem(DefaultAwareItem):
my_defaulted_field = scrapy.Field()
# Identical to:
#my_defaulted_field = scrapy.Field(default=None)
my_other_defaulted_field = scrapy.Field(default='a value')
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
for field_name, field_metadata in self.fields.items():
self.setdefault(field_name, field_metadata.get('default'))
class MyItem(DefaultAwareItem):
title = scrapy.Field()
title_explicitely_set = scrapy.Field(default="empty title")