This project uses Moondream v3 and Kosmos-2.5 models to perform Invoice extraction.
This project provides two methods for invoice data extraction:
- Moondream v3: Vision-language model for flexible, prompt-based extraction
- Kosmos-2.5: OCR + layout understanding with rule-based parsing (faster)
Streamlit Cloud Limitation: This app uses st.pdf() for PDF preview, which requires the optional streamlit[pdf] installation. Unfortunately, Streamlit Community Cloud has trouble installing this extra dependency, causing the PDF viewer to fail.
Recommended Solution: Please run the application in your local environment for the best experience and full PDF viewing functionality.
- Python 3.10 or higher
- uv package manager
- Modal account (for backend deployment)
- Hugging Face account with API token
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install project dependencies
cd md3
uv syncCreate a .env file in the md3/ directory:
HF_TOKEN=your_huggingface_token_here
MD3_BACKEND_URL=https://your-modal-url.modal.run
KOSMOS_BACKEND_URL=https://your-modal.modal.run# Authenticate with Modal
uv run modal setup
# Create a Hugging Face secret in Modal
uv run modal secret create huggingface-secret HF_TOKEN=your_huggingface_token_hereuv run modal deploy app/modal_app.pyCopy the deployment URLs and update them in your .env file:
MD3_BACKEND_URL→ Moondream endpoint URLKOSMOS_BACKEND_URL→ Kosmos endpoint URL
Run the Streamlit UI locally:
uv run streamlit run app/streamlit_app.py-
Upload Invoice: Upload a PDF or image file (PNG, JPG, etc.)
-
Choose Extraction Method:
Option A: Moondream v3
- Customize the extraction prompt
- Define fields to extract (invoice number, date, total, etc.)
- Define row fields for table data
- Click "Extract with Moondream"
Option B: Kosmos OCR
- Click "Extract with Kosmos"
- Get faster results with OCR + rule-based parsing
- View OCR preview with bounding boxes
-
View Results: Extracted data is displayed as JSON and tables
Edit app/config.json to customize:
- Default extraction fields
- Row fields for table data
- Model parameters (repo IDs, temperature, etc.)
# Deploy backend to Modal
uv run modal deploy app/modal_app.py
# Start frontend locally
uv run streamlit run app/streamlit_app.py