Skip to content

davemaier/pdf2md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf2md

Convert PDF files to Markdown using Mistral's OCR API.

Setup

uv as package manager is highly recommended (https://docs.astral.sh/uv/getting-started/installation)

  1. Clone/copy this project to your machine

  2. Copy .env.example to .env and add your Mistral API key:

    cp .env.example .env
    # Edit .env and set MISTRAL_API_KEY
  3. Install dependencies:

    uv sync

Usage

Run from the project directory:

uv run --env-file .env main.py input.pdf

With custom output folder:

uv run --env-file .env main.py input.pdf -o output_folder

Installing as a global command

Windows (not tested)

Add the project folder to PATH (windows add folder to path)

Then run from anywhere: pdf2md input.pdf

Linux/MacOS

Add an alias to your shell profile (.bashrc or .zshrc):

alias pdf2md='uv run --project <path_to_this_folder> --env-file <path_to_this_folder>/.env <path_to_this_folder>/main.py'

Output

Creates a folder (same name as the PDF) containing:

  • filename.md - The converted markdown
  • img-*.jpeg - Extracted images (if any)

About

Tiny CLI wrapper around mistral ocr to extract md from pdf

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published