Skip to content

Extract text, embedded images and convert pages to images from PDFs using Mojo.

Notifications You must be signed in to change notification settings

arraywaves/pdf-extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Image Tool

Extract embedded images and text, and convert pages to images from PDFs using Mojo.

Requirements

  • Mojo
  • Magic environment with:
    • pdfplumber
    • pdf2image

Setup

# Install dependencies
magic add "pdfplumber"
magic add "pdf2image"

Usage

# Run the tool (converts ./extract/target.pdf)
mojo main.mojo

# Or specify a different PDF
mojo main.mojo --pdf=/path/to/your.pdf

Magic

If using magic:

  • magic shell
  • mojo main.mojo
  • magic exit

Output

  • Extracted images will be in extracted_images/
  • Converted page images will be in converted_images/

About

Extract text, embedded images and convert pages to images from PDFs using Mojo.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages