Skip to content

Extend ItemLoader processors #31

@Matthijsy

Description

@Matthijsy

Currently there are three methods to add ItemLoader processor:

  • The default_input/output_processor on the ItemLoader class
  • The field_name_in/out on the ItemLoader class
  • The input/output_processor on the scrapy.Field

Personally I use the input/output_processor on the scrapy.Field combined with the default_input/output_processor a lot. But I use those in combination. Often I just want to add one more processor after the default processors. Since input/output_processor on scrapy.Field does a override of the defaults this is quite hard to do.
So I would propose to add another method to add a input/output processors. I would like to have something like add_input/output on the scrapy.Field, which would add the specified processor to the default processor.

I did implement this on my own ItemLoader class but think that it would be usefull for the scrapy core. My implementation is as follows (original source: https://github.com/scrapy/scrapy/blob/master/scrapy/loader/__init__.py#L69). Ofcourse this can be added to get_output_processor in the same way.

def get_input_processor(self, field_name):
        proc = getattr(self, '%s_in' % field_name, None)
        if not proc:
            override_proc = self._get_item_field_attr(field_name, 'input_processor')
            extend_proc = self._get_item_field_attr(field_name, 'add_input')
            if override_proc and extend_proc:
                raise ValueError(f'Not allowed to define input_processor and add_input to {field_name}')
            if override_proc:
                return override_proc
            elif extend_proc:
                return Compose(self.default_input_processor, extend_proc)
            return self.default_input_processor
        return proc

I am not sure if add_input is a good name, probably extend_input_processor is more clear but this quite a long name. I would like to hear if more people are wanting this feature and what you all think about what the naming should be.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions