Structured data extraction, instruction calling and agentic workflows with ML, LLM and Vision LLM
Sparrow is an API-first platform for enterprise document intelligence. It provides RESTful APIs for structured data extraction, instruction processing, and multi-agent workflow orchestration — all running on your own infrastructure with no external API calls or cloud dependencies.
The platform is designed around three core capabilities:
- Structured Extraction: Submit documents via REST and receive validated JSON output. Integrates directly into any backend or data pipeline.
- Instruction Processing: Beyond document extraction — text processing, validation, and decision-making via the instruction inference API.
- Agent Framework: Orchestrate multi-step workflows with custom agents, visual monitoring via Prefect, and robust error handling.
- Key Features
- Architecture
- Quickstart
- Installation
- Examples
- CLI Usage
- API Reference
- Sparrow Agent
- Dashboard
- Pipeline Comparison
- Performance Guide
- Troubleshooting
- License
- Universal Document Processing — Invoices, receipts, forms, bank statements, and tables
- Pluggable Architecture — Mix and match Sparrow Parse, Instructor, and Agent pipelines
- Multiple Inference Backends — MLX (Apple Silicon), Ollama, vLLM, Docker, Hugging Face Cloud GPU
- Multi-format Input — Images (PNG, JPG) and multi-page PDFs
- Schema Validation — JSON schema-based extraction with automatic validation
- API-First Design — RESTful endpoints for integration into any stack
- Instruction Calling — Text processing, validation, and decision-making with GPT-OSS, Mistral, Qwen, and others
- Visual Monitoring — Built-in dashboard and agent workflow tracking
- Enterprise Ready — Rate limiting, usage analytics, commercial licensing available
- Local Vision LLMs — Mistral, Qwen, DeepSeek OCR, dots.ocr, Gemma 4, and more
| Component | Purpose | Typical Use |
|---|---|---|
| Sparrow ML LLM | Main API engine | Document processing pipelines |
| Sparrow Parse | Vision LLM library | Structured JSON extraction |
| Sparrow Agents | Workflow orchestration | Complex multi-step processing |
| Sparrow OCR | Text recognition | OCR preprocessing |
| Sparrow UI | Web interface | Interactive document processing |
- Python 3.12.10+
- macOS (for MLX backend) or Linux/Windows (for other backends)
- Sufficient GPU memory for the selected Vision LLM
# Install Python 3.12.10 via pyenv
pyenv install 3.12.10
pyenv global 3.12.10
# Create a virtual environment
python -m venv .env_sparrow_parse
source .env_sparrow_parse/bin/activate # Linux/macOS
# .env_sparrow_parse\Scripts\activate # Windows
# Clone the repository
git clone https://github.com/jhonathan-humnel/ai-doc-forge.git
cd sparrow/sparrow-ml/llm
# Install dependencies
pip install -r requirements_sparrow_parse.txt
# macOS: install poppler for PDF processing
brew install poppler
# Start the API server
python api.pyBefore running
pip install, verify your platform target inrequirements_sparrow_parse.txt. Usesparrow-parse[mlx]for Apple Silicon andsparrow-parsefor Linux/Windows.
./sparrow.sh '[{"instrument_name":"str", "valuation":0}]' \
--pipeline "sparrow-parse" \
--options mlx \
--options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
--file-path "data/bonds_table.png"{
"data": [
{"instrument_name": "UNITS BLACKROCK FIX INC...", "valuation": 19049},
{"instrument_name": "UNITS ISHARES III PLC...", "valuation": 83488}
],
"valid": "true"
}git clone https://github.com/jhonathan-humnel/ai-doc-forge.git
cd sparrowFor complete installation instructions, see environment_setup.md.
Create separate environments for different pipelines:
| Environment | Pipeline |
|---|---|
.env_sparrow_parse |
Sparrow Parse (Vision LLM) |
.env_instructor |
Instructor (Text LLM) |
.env_ocr |
OCR service (optional) |
macOS:
brew install popplerUbuntu/Debian:
sudo apt-get install poppler-utils libpoppler-cpp-devApple Silicon — MLX backend delivers optimal performance.
NVIDIA/AMD GPU — Use the vLLM or Ollama backend.
CPU Only — Use smaller models or the Hugging Face cloud backend.
python api.py --port 8002
# API docs: http://localhost:8002/api/v1/sparrow-llm/docs./sparrow.sh "*" \
--pipeline "sparrow-parse" \
--options mlx \
--options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
--file-path "data/bank_statement.pdf"View JSON output
{
"bank": "First Platypus Bank",
"address": "1234 Kings St., New York, NY 12123",
"account_holder": "Mary G. Orta",
"account_number": "1234567890123",
"statement_date": "3/1/2022",
"period_covered": "2/1/2022 - 3/1/2022",
"account_summary": {
"balance_on_march_1": "$25,032.23",
"total_money_in": "$10,234.23",
"total_money_out": "$10,532.51"
},
"transactions": [
{"date": "02/01", "description": "PGD EasyPay Debit", "withdrawal": "203.24", "deposit": "", "balance": "22,098.23"},
{"date": "02/02", "description": "AB&B Online Payment*****", "withdrawal": "71.23", "deposit": "", "balance": "22,027.00"},
{"date": "02/04", "description": "Check No. 2345", "withdrawal": "", "deposit": "450.00", "balance": "22,477.00"},
{"date": "02/05", "description": "Payroll Direct Dep 23422342 Giants", "withdrawal": "", "deposit": "2,534.65", "balance": "25,011.65"}
],
"valid": "true"
}./sparrow.sh '[{"instrument_name":"str", "valuation":0}]' \
--pipeline "sparrow-parse" \
--options mlx \
--options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
--file-path "data/bonds_table.png"View JSON output
{
"data": [
{"instrument_name": "UNITS BLACKROCK FIX INC DUB FDS PLC ISHS EUR INV GRD CP BD IDX/INST/E", "valuation": 19049},
{"instrument_name": "UNITS ISHARES III PLC CORE EUR GOVT BOND UCITS ETF/EUR", "valuation": 83488},
{"instrument_name": "UNITS ISHARES III PLC EUR CORP BOND 1-5YR UCITS ETF/EUR", "valuation": 213030},
{"instrument_name": "UNIT ISHARES VI PLC/JP MORGAN USD E BOND EUR HED UCITS ETF DIST/HDGD/", "valuation": 32774},
{"instrument_name": "UNITS XTRACKERS II SICAV/EUR HY CORP BOND UCITS ETF/-1D-/DISTR.", "valuation": 23643}
],
"valid": "true"
}./sparrow.sh "*" \
--pipeline "sparrow-parse" \
--options mlx \
--options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
--crop-size 60 \
--file-path "data/invoice.pdf"View JSON output
{
"invoice_number": "61356291",
"date_of_issue": "09/06/2012",
"seller": {
"name": "Chapman, Kim and Green",
"address": "64731 James Branch, Smithmouth, NC 26872",
"tax_id": "949-84-9105",
"iban": "GB50ACIE59715038217063"
},
"client": {
"name": "Rodriguez-Stevens",
"address": "2280 Angela Plain, Hortonshire, MS 93248",
"tax_id": "939-98-8477"
},
"items": [
{"description": "Wine Glasses Goblets Pair Clear", "quantity": 5, "net_price": 12.0, "net_worth": 60.0, "vat_percentage": 10, "gross_worth": 66.0},
{"description": "With Hooks Stemware Storage Iron Wine Rack Hanging", "quantity": 4, "net_price": 28.08, "net_worth": 112.32, "vat_percentage": 10, "gross_worth": 123.55}
],
"summary": {"total_net_worth": 192.81, "total_vat": 19.28, "total_gross_worth": 212.09}
}./sparrow.sh '{"table": [{"description": "str", "latest_amount": 0, "previous_amount": 0}]}' \
--pipeline "sparrow-parse" \
--options mlx \
--options mlx-community/Qwen2.5-VL-72B-Instruct-4bit \
--file-path "data/financial_report.pdf" \
--debug-dir "debug/"# Arithmetic via instruction pipeline
./sparrow.sh "instruction: do arithmetic operation, payload: 2+2=" \
--pipeline "sparrow-instructor" \
--options mlx \
--options lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-8bit
# Document-grounded instruction
./sparrow.sh "check if business entity Chapman, Kim and Green is invoice issuing party" \
--pipeline "sparrow-parse" \
--instruction \
--options mlx --options lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-8bit \
--file-path "invoice_1.jpg"./sparrow.sh assistant --pipeline "stocks" --query "Oracle"{
"company": "Oracle Corporation",
"ticker": "ORCL"
}./sparrow.sh "*" --pipeline "sparrow-parse" \
--debug --table --table-template "sparrow_generic_table" \
--options mlx --options mlx-community/Qwen3.6-35B-A3B-8bit \
--options mlx --options mlx-community/dots.ocr-bf16 \
--file-path "data/well_report.jpg"./sparrow.sh '[{"instrument_name":"str", "valuation":"int"}]' \
--pipeline "sparrow-parse" --debug \
--options mlx --options mlx-community/gemma-4-31b-it-8bit \
--file-path "data/bonds_table.png" \
--hints-file-path "data/llm_hints_eu.json"./sparrow.sh "<JSON_SCHEMA>" --pipeline "<PIPELINE>" [OPTIONS] --file-path "<FILE>"| Argument | Type | Description | Example |
|---|---|---|---|
query |
JSON / String | Schema or instruction | '[{"field":"str"}]' |
--pipeline |
String | Pipeline name | sparrow-parse |
--file-path |
Path | Input document | data/invoice.pdf |
--hints-file-path |
Path | Extraction hints | data/hints.json |
--options |
String | Backend + model | mlx,model-name |
--instruction |
Flag | Treat query as instruction | |
--validation |
Flag | Treat query as validation spec | |
--markdown |
Flag | Enable markdown pre-processing | |
--table |
Flag | Enable table extraction mode | |
--table-template |
String | Table template name | sparrow_generic_table |
--crop-size |
Integer | Border crop pixels | 60 |
--page-type |
String | Page classification hint | financial_table |
--debug |
Flag | Enable debug output | |
--debug-dir |
Path | Directory for debug files | ./debug/ |
Sparrow Parse (Vision LLM)
# MLX backend (Apple Silicon)
./sparrow.sh '[{"instrument_name":"str", "valuation":0}]' \
--pipeline "sparrow-parse" \
--options mlx \
--options mlx-community/Qwen3.6-35B-A3B-8bit \
--file-path "data/bonds_table.png"
# Hugging Face Cloud GPU
--options huggingface --options your-space/model-name
# Additional flags
--options tables_only # Extract tables only
--options validation_off # Disable schema validation
--options apply_annotation # Include bounding box annotations
--page-type financial_table # Classify page typeSparrow Instructor (Text LLM)
./sparrow.sh "instruction: do arithmetic operation, payload: 2+2=" \
--pipeline "sparrow-instructor" \
--options mlx \
--options lmstudio-community/Mistral-Small-3.2-24B-Instruct-2506-8bit# Default port 8002
python api.py
# Custom port
python api.py --port 8001Document Extraction — POST /inference
curl -X POST 'http://localhost:8002/api/v1/sparrow-llm/inference' \
-H 'Content-Type: multipart/form-data' \
-F 'query=[{"field_name":"str", "amount":0}]' \
-F 'pipeline=sparrow-parse' \
-F 'options=mlx,mlx-community/Qwen2.5-VL-72B-Instruct-4bit' \
-F 'file=@document.pdf'Text Instructions — POST /instruction-inference
curl -X POST 'http://localhost:8002/api/v1/sparrow-llm/instruction-inference' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-d 'query=instruction: analyze data, payload: {...}' \
-d 'pipeline=sparrow-instructor' \
-d 'options=mlx,mlx-community/Qwen3.6-35B-A3B-8bit'Interactive Swagger docs: http://localhost:8002/api/v1/sparrow-llm/docs
The Agent component orchestrates complex document processing workflows with visual monitoring powered by Prefect.
Capabilities:
- Chain classification, extraction, and validation steps
- Real-time pipeline monitoring
- Robust failure recovery
- Extensible agent definitions for domain-specific use cases
# Start the agent server
cd sparrow-ml/agents
python api.py --port 8001
# Submit a document processing job
curl -X POST 'http://localhost:8001/api/v1/sparrow-agents/execute/file' \
-F 'agent_name=medical_prescriptions' \
-F 'extraction_params={"sparrow_key":"123456"}' \
-F 'file=@prescription.pdf'The built-in analytics dashboard is part of Sparrow UI and requires a local Oracle Database 23ai Free instance.
Dashboard capabilities:
- API call volume and success rates over time
- Geographic distribution of usage
- Per-model performance comparison
- Real-time processing statistics
| Feature | Sparrow Parse | Sparrow Instructor | Sparrow Agents |
|---|---|---|---|
| Input | Documents + JSON schema | Text instructions | Complex workflows |
| Output | Structured JSON | Free-form text | Multi-step results |
| Primary Use Cases | Data extraction, forms | Summarization, analysis | Enterprise workflows |
| Validation | Schema-based | Manual | Custom rules |
| Relative Complexity | Low | Medium | High |
| Best For | Invoices, tables, forms | Text processing | Multi-document flows |
Decision guide:
- Use Sparrow Parse for structured data extraction from documents.
- Use Sparrow Instructor for text analysis, summarization, and Q&A.
- Use Sparrow Agents for complex multi-step document processing workflows.
| Hardware | Recommended Backend | Notes |
|---|---|---|
| Apple Silicon (M-series) | MLX | Optimal unified-memory performance |
| NVIDIA GPU (≥96 GB VRAM) | vLLM | Full-precision production inference |
| NVIDIA GPU (<96 GB VRAM) | vLLM | Use quantized or MoE models |
| CPU only | Ollama / Hugging Face | Use models ≤7B parameters |
| Use Case | Recommended Model | Backend |
|---|---|---|
| Invoices / Forms (EU) | Mistral Small 3.2 24B | vLLM / MLX |
| Invoices / Forms (US) | Gemma 4 31B Dense | MLX |
| Large tables | dots.ocr + Sparrow Templates | vLLM |
| General accuracy | Qwen3.6 27B Dense | MLX |
| Low memory | Qwen3.6 35B MoE / Gemma 4 26B MoE | MLX |
For large or complex tables, prefer the dots.ocr + Sparrow Templates pipeline over direct Vision LLM extraction:
./sparrow.sh "*" --pipeline "sparrow-parse" \
--debug --table --table-template "sparrow_generic_table" \
--options mlx --options mlx-community/Qwen3.6-35B-A3B-8bit \
--options mlx --options mlx-community/dots.ocr-bf16 \
--file-path "data/well_report.jpg"dots.ocr produces an HTML intermediate representation which Sparrow Templates maps to the target JSON schema. This is the recommended approach for financial statements, multi-column invoices, and structured reports.
Hints steer model attention and improve accuracy on complex documents — useful for footers, fine print, structurally similar fields (e.g., supplier vs. recipient VAT), and date/number format normalization.
Installation problems
Python version:
python --version # Must be 3.12.10+
pyenv install 3.12.10 && pyenv global 3.12.10MLX on Apple Silicon:
pip install --upgrade pip
pip install mlx-vlm --no-cache-dirPoppler missing:
# macOS
brew install poppler
# Ubuntu/Debian
sudo apt-get install poppler-utils
# Verify
pdftoppm -hRuntime issues
Memory errors:
- Switch to a smaller or MoE model
- Enable cropping:
--crop-size 100 - Process single pages rather than full PDFs
Model loading fails:
rm -rf ~/.cache/huggingface/ ~/.mlx/
python -c "from mlx_vlm import load; load('model-name')"API connection issues:
curl http://localhost:8002/health
python api.py --debugDocument processing issues
Poor extraction quality:
- Add hints with
--hints-file-path - Apply border cropping:
--crop-size 60 - Use
--table --table-templatewith dots.ocr for dense tables - Verify image resolution (300+ DPI recommended)
- Do not disable schema validation unless necessary
PDF processing fails:
pdftoppm -png input.pdf output
python -c "
import pypdf
with open('file.pdf', 'rb') as f:
print(f'Pages: {len(pypdf.PdfReader(f).pages)}')
"JSON schema errors:
- Validate with jsonlint.com
- Use typed placeholders:
"str",0,0.0,"str or null" - Start with a minimal schema and expand incrementally
- Review this README and the component-level documentation
- Search GitHub Issues for known issues
- Open a new issue with logs, system information, and a minimal reproduction case
- For commercial support or licensing: contact@sparrow-project.org
Open Source — Licensed under GPL 3.0. Free for open source projects and organizations with gross revenue under $5M USD annually.
Commercial — Dual licensing is available for proprietary use, enterprise features, and dedicated support. Contact contact@sparrow-project.org for commercial licensing and consulting.
Sparrow Project Contributors
Star this repository on GitHub if Sparrow is useful for your projects.
github.com/jhonathan-humnel/ai-doc-forge






