feat/Markdown Extraction

**JSON Extraction** 
I want to use the Unstructured extraction to feed an LLM directly without losing all the metadata (from partitioning), but the JSON format is not recommended as an input for LLM.

**MARKDOWN Extraction**
Have the possibility to choose the extraction format either JSON or MARKDOWN (with all elements in the format keeping the semantic structure of the document) OR have a function "convert_to_markdown".

**JSON to MARKDOWN custom conversion**
I need to code it.

**Additional context**
See pymuPDF as a benchmark.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/Markdown Extraction #3525

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat/Markdown Extraction #3525

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions