Skip to content

jhonathan-humnel/ai-doc-forge

Repository files navigation

Sparrow

PyPI - Python GitHub Stars GitHub Issues Current Version License: GPL v3

Structured data extraction, instruction calling and agentic workflows with ML, LLM and Vision LLM


Overview

Sparrow is an API-first platform for enterprise document intelligence. It provides RESTful APIs for structured data extraction, instruction processing, and multi-agent workflow orchestration — all running on your own infrastructure with no external API calls or cloud dependencies.

The platform is designed around three core capabilities:

  • Structured Extraction: Submit documents via REST and receive validated JSON output. Integrates directly into any backend or data pipeline.
  • Instruction Processing: Beyond document extraction — text processing, validation, and decision-making via the instruction inference API.
  • Agent Framework: Orchestrate multi-step workflows with custom agents, visual monitoring via Prefect, and robust error handling.

Sparrow UI


Table of Contents


Key Features

  • Universal Document Processing — Invoices, receipts, forms, bank statements, and tables
  • Pluggable Architecture — Mix and match Sparrow Parse, Instructor, and Agent pipelines
  • Multiple Inference Backends — MLX (Apple Silicon), Ollama, vLLM, Docker, Hugging Face Cloud GPU
  • Multi-format Input — Images (PNG, JPG) and multi-page PDFs
  • Schema Validation — JSON schema-based extraction with automatic validation
  • API-First Design — RESTful endpoints for integration into any stack
  • Instruction Calling — Text processing, validation, and decision-making with GPT-OSS, Mistral, Qwen, and others
  • Visual Monitoring — Built-in dashboard and agent workflow tracking
  • Enterprise Ready — Rate limiting, usage analytics, commercial licensing available
  • Local Vision LLMs — Mistral, Qwen, DeepSeek OCR, dots.ocr, Gemma 4, and more

Architecture

Sparrow Architecture

Core Components

Component Purpose Typical Use
Sparrow ML LLM Main API engine Document processing pipelines
Sparrow Parse Vision LLM library Structured JSON extraction
Sparrow Agents Workflow orchestration Complex multi-step processing
Sparrow OCR Text recognition OCR preprocessing
Sparrow UI Web interface Interactive document processing

Quickstart

Prerequisites

  • Python 3.12.10+
  • macOS (for MLX backend) or Linux/Windows (for other backends)
  • Sufficient GPU memory for the selected Vision LLM

Setup

# Install Python 3.12.10 via pyenv
pyenv install 3.12.10
pyenv global 3.12.10

# Create a virtual environment
python -m venv .env_sparrow_parse
source .env_sparrow_parse/bin/activate    # Linux/macOS
# .env_sparrow_parse\Scripts\activate     # Windows

# Clone the repository
git clone https://github.com/jhonathan-humnel/ai-doc-forge.git
cd sparrow/sparrow-ml/llm

# Install dependencies
pip install -r requirements_sparrow_parse.txt

# macOS: install poppler for PDF processing
brew install poppler

# Start the API server
python api.py

Before running pip install, verify your platform target in requirements_sparrow_parse.txt. Use sparrow-parse[mlx] for Apple Silicon and sparrow-parse for Linux/Windows.

First Extraction

./sparrow.sh '[{"instrument_name":"str", "valuation":0}]' \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --file-path "data/bonds_table.png"
{
  "data": [
    {"instrument_name": "UNITS BLACKROCK FIX INC...", "valuation": 19049},
    {"instrument_name": "UNITS ISHARES III PLC...", "valuation": 83488}
  ],
  "valid": "true"
}

Installation

git clone https://github.com/jhonathan-humnel/ai-doc-forge.git
cd sparrow

For complete installation instructions, see environment_setup.md.

Virtual Environments

Create separate environments for different pipelines:

Environment Pipeline
.env_sparrow_parse Sparrow Parse (Vision LLM)
.env_instructor Instructor (Text LLM)
.env_ocr OCR service (optional)

Platform Dependencies

macOS:

brew install poppler

Ubuntu/Debian:

sudo apt-get install poppler-utils libpoppler-cpp-dev

Apple Silicon — MLX backend delivers optimal performance.
NVIDIA/AMD GPU — Use the vLLM or Ollama backend.
CPU Only — Use smaller models or the Hugging Face cloud backend.

Verify Installation

python api.py --port 8002
# API docs: http://localhost:8002/api/v1/sparrow-llm/docs

Examples

Bank Statement Processing

Bank Statement

./sparrow.sh "*" \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --file-path "data/bank_statement.pdf"
View JSON output
{
  "bank": "First Platypus Bank",
  "address": "1234 Kings St., New York, NY 12123",
  "account_holder": "Mary G. Orta",
  "account_number": "1234567890123",
  "statement_date": "3/1/2022",
  "period_covered": "2/1/2022 - 3/1/2022",
  "account_summary": {
    "balance_on_march_1": "$25,032.23",
    "total_money_in": "$10,234.23",
    "total_money_out": "$10,532.51"
  },
  "transactions": [
    {"date": "02/01", "description": "PGD EasyPay Debit", "withdrawal": "203.24", "deposit": "", "balance": "22,098.23"},
    {"date": "02/02", "description": "AB&B Online Payment*****", "withdrawal": "71.23", "deposit": "", "balance": "22,027.00"},
    {"date": "02/04", "description": "Check No. 2345", "withdrawal": "", "deposit": "450.00", "balance": "22,477.00"},
    {"date": "02/05", "description": "Payroll Direct Dep 23422342 Giants", "withdrawal": "", "deposit": "2,534.65", "balance": "25,011.65"}
  ],
  "valid": "true"
}

Financial Table Extraction

Bonds Table

./sparrow.sh '[{"instrument_name":"str", "valuation":0}]' \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --file-path "data/bonds_table.png"
View JSON output
{
  "data": [
    {"instrument_name": "UNITS BLACKROCK FIX INC DUB FDS PLC ISHS EUR INV GRD CP BD IDX/INST/E", "valuation": 19049},
    {"instrument_name": "UNITS ISHARES III PLC CORE EUR GOVT BOND UCITS ETF/EUR", "valuation": 83488},
    {"instrument_name": "UNITS ISHARES III PLC EUR CORP BOND 1-5YR UCITS ETF/EUR", "valuation": 213030},
    {"instrument_name": "UNIT ISHARES VI PLC/JP MORGAN USD E BOND EUR HED UCITS ETF DIST/HDGD/", "valuation": 32774},
    {"instrument_name": "UNITS XTRACKERS II SICAV/EUR HY CORP BOND UCITS ETF/-1D-/DISTR.", "valuation": 23643}
  ],
  "valid": "true"
}

Invoice Processing

./sparrow.sh "*" \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --crop-size 60 \
  --file-path "data/invoice.pdf"
View JSON output
{
  "invoice_number": "61356291",
  "date_of_issue": "09/06/2012",
  "seller": {
    "name": "Chapman, Kim and Green",
    "address": "64731 James Branch, Smithmouth, NC 26872",
    "tax_id": "949-84-9105",
    "iban": "GB50ACIE59715038217063"
  },
  "client": {
    "name": "Rodriguez-Stevens",
    "address": "2280 Angela Plain, Hortonshire, MS 93248",
    "tax_id": "939-98-8477"
  },
  "items": [
    {"description": "Wine Glasses Goblets Pair Clear", "quantity": 5, "net_price": 12.0, "net_worth": 60.0, "vat_percentage": 10, "gross_worth": 66.0},
    {"description": "With Hooks Stemware Storage Iron Wine Rack Hanging", "quantity": 4, "net_price": 28.08, "net_worth": 112.32, "vat_percentage": 10, "gross_worth": 123.55}
  ],
  "summary": {"total_net_worth": 192.81, "total_vat": 19.28, "total_gross_worth": 212.09}
}

Multi-page PDF Processing

./sparrow.sh '{"table": [{"description": "str", "latest_amount": 0, "previous_amount": 0}]}' \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
  --file-path "data/financial_report.pdf" \
  --debug-dir "debug/"

Text Instruction Processing

# Arithmetic via instruction pipeline
./sparrow.sh "instruction: do arithmetic operation, payload: 2+2=" \
  --pipeline "sparrow-instructor" \
  --options mlx \
  --options lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-8bit

# Document-grounded instruction
./sparrow.sh "check if business entity Chapman, Kim and Green is invoice issuing party" \
  --pipeline "sparrow-parse" \
  --instruction \
  --options mlx --options lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-8bit \
  --file-path "invoice_1.jpg"

Stock Data Function Calling

./sparrow.sh assistant --pipeline "stocks" --query "Oracle"
{
  "company": "Oracle Corporation",
  "ticker": "ORCL"
}

Table/Form Extraction with Sparrow Templates

./sparrow.sh "*" --pipeline "sparrow-parse" \
  --debug --table --table-template "sparrow_generic_table" \
  --options mlx --options mlx-community/Qwen3.6-35B-A3B-8bit \
  --options mlx --options mlx-community/dots.ocr-bf16 \
  --file-path "data/well_report.jpg"

Extraction Hints

./sparrow.sh '[{"instrument_name":"str", "valuation":"int"}]' \
  --pipeline "sparrow-parse" --debug \
  --options mlx --options mlx-community/gemma-4-31b-it-8bit \
  --file-path "data/bonds_table.png" \
  --hints-file-path "data/llm_hints_eu.json"

CLI Usage

Syntax

./sparrow.sh "<JSON_SCHEMA>" --pipeline "<PIPELINE>" [OPTIONS] --file-path "<FILE>"

Arguments

Argument Type Description Example
query JSON / String Schema or instruction '[{"field":"str"}]'
--pipeline String Pipeline name sparrow-parse
--file-path Path Input document data/invoice.pdf
--hints-file-path Path Extraction hints data/hints.json
--options String Backend + model mlx,model-name
--instruction Flag Treat query as instruction
--validation Flag Treat query as validation spec
--markdown Flag Enable markdown pre-processing
--table Flag Enable table extraction mode
--table-template String Table template name sparrow_generic_table
--crop-size Integer Border crop pixels 60
--page-type String Page classification hint financial_table
--debug Flag Enable debug output
--debug-dir Path Directory for debug files ./debug/

Pipeline Options

Sparrow Parse (Vision LLM)

# MLX backend (Apple Silicon)
./sparrow.sh '[{"instrument_name":"str", "valuation":0}]' \
  --pipeline "sparrow-parse" \
  --options mlx \
  --options mlx-community/Qwen3.6-35B-A3B-8bit \
  --file-path "data/bonds_table.png"

# Hugging Face Cloud GPU
--options huggingface --options your-space/model-name

# Additional flags
--options tables_only        # Extract tables only
--options validation_off     # Disable schema validation
--options apply_annotation   # Include bounding box annotations
--page-type financial_table  # Classify page type

Sparrow Instructor (Text LLM)

./sparrow.sh "instruction: do arithmetic operation, payload: 2+2=" \
  --pipeline "sparrow-instructor" \
  --options mlx \
  --options lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-8bit

API Reference

Starting the Server

# Default port 8002
python api.py

# Custom port
python api.py --port 8001

Endpoints

Document ExtractionPOST /inference

curl -X POST 'http://localhost:8002/api/v1/sparrow-llm/inference' \
  -H 'Content-Type: multipart/form-data' \
  -F 'query=[{"field_name":"str", "amount":0}]' \
  -F 'pipeline=sparrow-parse' \
  -F 'options=mlx,mlx-community/Qwen2.5-VL-72B-Instruct-4bit' \
  -F 'file=@document.pdf'

Text InstructionsPOST /instruction-inference

curl -X POST 'http://localhost:8002/api/v1/sparrow-llm/instruction-inference' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'query=instruction: analyze data, payload: {...}' \
  -d 'pipeline=sparrow-instructor' \
  -d 'options=mlx,mlx-community/Qwen3.6-35B-A3B-8bit'

Interactive Swagger docs: http://localhost:8002/api/v1/sparrow-llm/docs

API Documentation


Sparrow Agent

Sparrow Agents

The Agent component orchestrates complex document processing workflows with visual monitoring powered by Prefect.

Capabilities:

  • Chain classification, extraction, and validation steps
  • Real-time pipeline monitoring
  • Robust failure recovery
  • Extensible agent definitions for domain-specific use cases

Usage

# Start the agent server
cd sparrow-ml/agents
python api.py --port 8001

# Submit a document processing job
curl -X POST 'http://localhost:8001/api/v1/sparrow-agents/execute/file' \
  -F 'agent_name=medical_prescriptions' \
  -F 'extraction_params={"sparrow_key":"123456"}' \
  -F 'file=@prescription.pdf'

Dashboard

The built-in analytics dashboard is part of Sparrow UI and requires a local Oracle Database 23ai Free instance.

Dashboard capabilities:

  • API call volume and success rates over time
  • Geographic distribution of usage
  • Per-model performance comparison
  • Real-time processing statistics

Pipeline Comparison

Feature Sparrow Parse Sparrow Instructor Sparrow Agents
Input Documents + JSON schema Text instructions Complex workflows
Output Structured JSON Free-form text Multi-step results
Primary Use Cases Data extraction, forms Summarization, analysis Enterprise workflows
Validation Schema-based Manual Custom rules
Relative Complexity Low Medium High
Best For Invoices, tables, forms Text processing Multi-document flows

Decision guide:

  • Use Sparrow Parse for structured data extraction from documents.
  • Use Sparrow Instructor for text analysis, summarization, and Q&A.
  • Use Sparrow Agents for complex multi-step document processing workflows.

Performance Guide

Backend Selection

Hardware Recommended Backend Notes
Apple Silicon (M-series) MLX Optimal unified-memory performance
NVIDIA GPU (≥96 GB VRAM) vLLM Full-precision production inference
NVIDIA GPU (<96 GB VRAM) vLLM Use quantized or MoE models
CPU only Ollama / Hugging Face Use models ≤7B parameters

Model Selection

Use Case Recommended Model Backend
Invoices / Forms (EU) Mistral Small 3.2 24B vLLM / MLX
Invoices / Forms (US) Gemma 4 31B Dense MLX
Large tables dots.ocr + Sparrow Templates vLLM
General accuracy Qwen3.6 27B Dense MLX
Low memory Qwen3.6 35B MoE / Gemma 4 26B MoE MLX

Table Extraction

For large or complex tables, prefer the dots.ocr + Sparrow Templates pipeline over direct Vision LLM extraction:

./sparrow.sh "*" --pipeline "sparrow-parse" \
  --debug --table --table-template "sparrow_generic_table" \
  --options mlx --options mlx-community/Qwen3.6-35B-A3B-8bit \
  --options mlx --options mlx-community/dots.ocr-bf16 \
  --file-path "data/well_report.jpg"

dots.ocr produces an HTML intermediate representation which Sparrow Templates maps to the target JSON schema. This is the recommended approach for financial statements, multi-column invoices, and structured reports.

Extraction Hints

Hints steer model attention and improve accuracy on complex documents — useful for footers, fine print, structurally similar fields (e.g., supplier vs. recipient VAT), and date/number format normalization.


Troubleshooting

Installation problems

Python version:

python --version  # Must be 3.12.10+
pyenv install 3.12.10 && pyenv global 3.12.10

MLX on Apple Silicon:

pip install --upgrade pip
pip install mlx-vlm --no-cache-dir

Poppler missing:

# macOS
brew install poppler

# Ubuntu/Debian
sudo apt-get install poppler-utils

# Verify
pdftoppm -h
Runtime issues

Memory errors:

  • Switch to a smaller or MoE model
  • Enable cropping: --crop-size 100
  • Process single pages rather than full PDFs

Model loading fails:

rm -rf ~/.cache/huggingface/ ~/.mlx/
python -c "from mlx_vlm import load; load('model-name')"

API connection issues:

curl http://localhost:8002/health
python api.py --debug
Document processing issues

Poor extraction quality:

  • Add hints with --hints-file-path
  • Apply border cropping: --crop-size 60
  • Use --table --table-template with dots.ocr for dense tables
  • Verify image resolution (300+ DPI recommended)
  • Do not disable schema validation unless necessary

PDF processing fails:

pdftoppm -png input.pdf output

python -c "
import pypdf
with open('file.pdf', 'rb') as f:
    print(f'Pages: {len(pypdf.PdfReader(f).pages)}')
"

JSON schema errors:

  • Validate with jsonlint.com
  • Use typed placeholders: "str", 0, 0.0, "str or null"
  • Start with a minimal schema and expand incrementally

Getting Help

  1. Review this README and the component-level documentation
  2. Search GitHub Issues for known issues
  3. Open a new issue with logs, system information, and a minimal reproduction case
  4. For commercial support or licensing: contact@sparrow-project.org

Star History

Star History Chart


License

Open Source — Licensed under GPL 3.0. Free for open source projects and organizations with gross revenue under $5M USD annually.

Commercial — Dual licensing is available for proprietary use, enterprise features, and dedicated support. Contact contact@sparrow-project.org for commercial licensing and consulting.


Authors

Sparrow Project Contributors


Star this repository on GitHub if Sparrow is useful for your projects.
github.com/jhonathan-humnel/ai-doc-forge

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors