-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Description
If I create an Item using a dataclass and define a default value, that default value is appended at the start of the resulting array of the output-processor input. Should not the default value be overriden or at least be at the end of the resulting array when the input processor result is not None? Is this intended behavior?
If so, it means that the user has to create a new Loader function, i.e. TakeSecond(), which is identical in functioning as TakeFirst() but it takes the second value in the array if they want to provide a non-None default value and takes the first value in case the input processor received a None value
Steps to Reproduce
import scrapy
from dataclasses import dataclass
from typing import Optional
from itemloaders.processors import TakeFirst, MapCompose, Join, Compose, Identity
from scrapy.loader import ItemLoader
# Item definition
@dataclass
class ArticleItem:
user_rating: Optional[float] = -999
# Item Loader definition
class RappiLoader(ItemLoader):
default_output_processor = TakeFirst()
user_rating_in = MapCompose(lambda x: x.get('score') if isinstance(x, dict) else None)
Expected behavior:
resulting item should be ArticleItem(user_rating=4.5)
Actual behavior:
Result is ArticleItem(user_rating=-999)
If I inspect the value that gets fed to the output processor it is [-999, 4.5]. That's why when using TakeFirst(), it seems as if the input processor did not receive a valid value from the spider, which is not the case.
Reproduces how often:
Every time
Versions
Scrapy : 2.6.2
lxml : 4.8.0.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 22.4.0
Python : 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:00:52) - [Clang 13.0.1 ]
pyOpenSSL : 22.0.0 (OpenSSL 1.1.1s 1 Nov 2022)
cryptography : 37.0.4
Platform : macOS-13.1-x86_64-i386-64bit