Open
Description
JSON Extraction
I want to use the Unstructured extraction to feed an LLM directly without losing all the metadata (from partitioning), but the JSON format is not recommended as an input for LLM.
MARKDOWN Extraction
Have the possibility to choose the extraction format either JSON or MARKDOWN (with all elements in the format keeping the semantic structure of the document) OR have a function "convert_to_markdown".
JSON to MARKDOWN custom conversion
I need to code it.
Additional context
See pymuPDF as a benchmark.