- RAG/LLM and PDF: Conversion to Markdown Text with PyMuPDF
- LLMs Love Structure: Using Markdown for Better PDF Analysis - May 2024
- Improved RAG Document Processing With Markdown - Nov 2024
- Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
- From Markdown to Training Data - Aug 2023
- pyromark - a blazingly fast CommonMark-compliant Markdown parser for Python.
- mistune - A fast yet powerful Python Markdown parser with renderers and plugins, compatible with sane CommonMark rules.
- umarkdown - an ultra fast (high performance) Markdown parser compliant with the markdown spec written in pure C with bindings for Python 3.8+. See cli
- markitdown