Skip to content

Latest commit

 

History

History
108 lines (75 loc) · 2.8 KB

File metadata and controls

108 lines (75 loc) · 2.8 KB

Invoice Extraction App

This project uses Moondream v3 and Kosmos-2.5 models to perform Invoice extraction.

Overview

This project provides two methods for invoice data extraction:

  • Moondream v3: Vision-language model for flexible, prompt-based extraction
  • Kosmos-2.5: OCR + layout understanding with rule-based parsing (faster)

⚠️ Important Note

Streamlit Cloud Limitation: This app uses st.pdf() for PDF preview, which requires the optional streamlit[pdf] installation. Unfortunately, Streamlit Community Cloud has trouble installing this extra dependency, causing the PDF viewer to fail.

Recommended Solution: Please run the application in your local environment for the best experience and full PDF viewing functionality.

Prerequisites

  • Python 3.10 or higher
  • uv package manager
  • Modal account (for backend deployment)
  • Hugging Face account with API token

Setup

1. Install Dependencies

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install project dependencies
cd md3
uv sync

2. Configure Environment Variables

Create a .env file in the md3/ directory:

HF_TOKEN=your_huggingface_token_here
MD3_BACKEND_URL=https://your-modal-url.modal.run
KOSMOS_BACKEND_URL=https://your-modal.modal.run

3. Setup Modal

# Authenticate with Modal
uv run modal setup

# Create a Hugging Face secret in Modal
uv run modal secret create huggingface-secret HF_TOKEN=your_huggingface_token_here

Running the Application

Deploy Backend to Modal

uv run modal deploy app/modal_app.py

Copy the deployment URLs and update them in your .env file:

  • MD3_BACKEND_URL → Moondream endpoint URL
  • KOSMOS_BACKEND_URL → Kosmos endpoint URL

Start Frontend Locally

Run the Streamlit UI locally:

uv run streamlit run app/streamlit_app.py

Using the Application

  1. Upload Invoice: Upload a PDF or image file (PNG, JPG, etc.)

  2. Choose Extraction Method:

    Option A: Moondream v3

    • Customize the extraction prompt
    • Define fields to extract (invoice number, date, total, etc.)
    • Define row fields for table data
    • Click "Extract with Moondream"

    Option B: Kosmos OCR

    • Click "Extract with Kosmos"
    • Get faster results with OCR + rule-based parsing
    • View OCR preview with bounding boxes
  3. View Results: Extracted data is displayed as JSON and tables

Configuration

Edit app/config.json to customize:

  • Default extraction fields
  • Row fields for table data
  • Model parameters (repo IDs, temperature, etc.)

Development Commands

# Deploy backend to Modal
uv run modal deploy app/modal_app.py

# Start frontend locally
uv run streamlit run app/streamlit_app.py