Skip to content

feat: ability to skip non-plain-text element types in chunk_by_title() #1695

Open
@cragwolfe

Description

@cragwolfe

Is your feature request related to a problem? Please describe.

chunk_by_title is a great way to combine related text elements. however, the caller may not want to combine all element types, e.g. Table and Figure, with other element types when forming the CompositeElements.

Describe the solution you'd like

Add
skip_element_types=['Table', 'Figure', <... and any other "non-plain text" elements>] to chunk_by_title, and also make this parameter accessible from partition_ functions and unstructured-ingest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    chunkingRelated to element chunking.enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions