Convert PDF files to Markdown using Mistral's OCR API.
uv as package manager is highly recommended (https://docs.astral.sh/uv/getting-started/installation)
-
Clone/copy this project to your machine
-
Copy
.env.exampleto.envand add your Mistral API key:cp .env.example .env # Edit .env and set MISTRAL_API_KEY -
Install dependencies:
uv sync
Run from the project directory:
uv run --env-file .env main.py input.pdfWith custom output folder:
uv run --env-file .env main.py input.pdf -o output_folderAdd the project folder to PATH (windows add folder to path)
Then run from anywhere: pdf2md input.pdf
Add an alias to your shell profile (.bashrc or .zshrc):
alias pdf2md='uv run --project <path_to_this_folder> --env-file <path_to_this_folder>/.env <path_to_this_folder>/main.py'
Creates a folder (same name as the PDF) containing:
filename.md- The converted markdownimg-*.jpeg- Extracted images (if any)