DebdeepGhosh2511 / Docling_pdfextract Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Extracts text, tables, and images from PDFs using docling like structure in python organized by page number.

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
input		input
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Repository files navigation

Docling PDF Extractor

A Streamlit app to extract text, tables, and images from PDFs — organized page-wise.

Features

Extracts text (.txt), images (.jpg), and tables (.csv)
Saves per-page output into folders under output/
Displays extracted data in Streamlit UI
Generates and downloads metadata.json

Setup

pip install -r requirements.txt
streamlit run app.py

About

Extracts text, tables, and images from PDFs using docling like structure in python organized by page number.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%