Skip to content

feat/Markdown Extraction #3525

Open
Open
@pat-ben

Description

@pat-ben

JSON Extraction
I want to use the Unstructured extraction to feed an LLM directly without losing all the metadata (from partitioning), but the JSON format is not recommended as an input for LLM.

MARKDOWN Extraction
Have the possibility to choose the extraction format either JSON or MARKDOWN (with all elements in the format keeping the semantic structure of the document) OR have a function "convert_to_markdown".

JSON to MARKDOWN custom conversion
I need to code it.

Additional context
See pymuPDF as a benchmark.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions