Skip to content

Extracts text, tables, and images from PDFs using docling like structure in python organized by page number.

License

Notifications You must be signed in to change notification settings

DebdeepGhosh2511/Docling_pdfextract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Docling PDF Extractor

A Streamlit app to extract text, tables, and images from PDFs — organized page-wise.

Features

  • Extracts text (.txt), images (.jpg), and tables (.csv)
  • Saves per-page output into folders under output/
  • Displays extracted data in Streamlit UI
  • Generates and downloads metadata.json

Setup

pip install -r requirements.txt
streamlit run app.py

About

Extracts text, tables, and images from PDFs using docling like structure in python organized by page number.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages