Skip to content
This repository was archived by the owner on Mar 1, 2024. It is now read-only.
This repository was archived by the owner on Mar 1, 2024. It is now read-only.

[Feature Request]: Pass args and kwargs when calling partition or partition_via_api on Unstructured loader #949

@IgnacioPascale

Description

@IgnacioPascale

Feature Description

Pass args and kwargs on Unstructured base for load_data() and pass them when calling partition() or partition_via_api().

This would add flexibility to manipulate the (far too many) kwargs from the paritition library.

Reason

Over the last week, I tried taking advantage of the many good advantages partition offers through this loader. To give a few examples,

  • For .docx I intended to use include_page_breaks, which is set True by default on their docx.py but False on their "auto" method partition -> this is the one called by the loader.

  • For .pdf, I intended to use cool features such as infer_table_structure or strategy (to set hi_res). Similarly, I intended to use the former kwarg for .pptx as well.

The fact that I cannot manipulate the kwargs passed onto partition prevents me from manipulating data extraction the way I intend, and it's forcing me to subclass and override behavior for a very simple change.

Value of Feature

As explained before, users would be able to take advantage of the many great functionalities unstructured can offer, namely infer_table_structure, strategy, include_page_breaks, etc, by simply passing args and kwargs to the partition() or partition_via_api() methods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions