LiteParse OCR API Specification

This document defines the standard HTTP API that OCR servers must implement to work with LiteParse.

Overview

LiteParse expects a simple HTTP endpoint that accepts an image and returns text with bounding boxes. Your OCR server can internally use any OCR engine (EasyOCR, PaddleOCR, Tesseract, Cloud APIs, etc.) as long as it conforms to this API.

Endpoint

POST /ocr

Request Format

Content-Type: multipart/form-data

Fields:

Field	Type	Required	Description
`file`	binary	Yes	Image file (PNG, JPG, etc.)
`language`	string	No	Language code (default: `en`)
`page_number`	integer	No	Page number metadata for OCR servers that preserve page context
`strict_bboxes`	boolean string	No	Optional server-specific hint to drop OCR regions without usable bounding boxes

Language Codes

Use ISO 639-1 two-letter codes:

en - English
zh - Chinese
ja - Japanese
ko - Korean
fr - French
de - German
es - Spanish
ar - Arabic
etc.

Your server should map these to whatever format your underlying OCR engine expects.

Response Format

Content-Type: application/json

Structure:

{
  "results": [
    {
      "text": "recognized text",
      "bbox": [x1, y1, x2, y2],
      "confidence": 0.95
    }
  ]
}

Fields:

Field	Type	Description
`results`	array	Array of text detection results
`results[].text`	string	Recognized text content
`results[].bbox`	[number, number, number, number]	Bounding box `[x1, y1, x2, y2]` where (x1,y1) is top-left and (x2,y2) is bottom-right
`results[].confidence`	number	Confidence score between 0.0 and 1.0

Servers may include extra top-level metadata such as engine, model, or warnings; LiteParse clients must continue to rely on the baseline results[] contract.

Example

Request

curl -X POST http://localhost:8080/ocr \
  -F "file=@document.png" \
  -F "language=en"

Response

{
  "results": [
    {
      "text": "Hello",
      "bbox": [10, 20, 60, 40],
      "confidence": 0.98
    },
    {
      "text": "World",
      "bbox": [70, 20, 130, 40],
      "confidence": 0.97
    }
  ]
}

Error Handling

Return appropriate HTTP status codes:

200 OK - Success
400 Bad Request - Invalid request (missing file, invalid language, etc.)
500 Internal Server Error - OCR processing failed

Error response format:

{
  "error": "Description of the error"
}

Implementation Notes

Coordinate System

Origin (0,0) is at the top-left of the image
X increases to the right
Y increases downward
All coordinates are in pixels

Bounding Box Format

Always return axis-aligned bounding boxes as [x1, y1, x2, y2]:

x1, y1 = top-left corner
x2, y2 = bottom-right corner
x2 > x1 and y2 > y1

If your OCR engine returns rotated boxes or polygon coordinates, convert them to axis-aligned boxes by taking min/max coordinates.

Confidence Scores

Normalize to range 0.0 to 1.0
1.0 = 100% confident
0.0 = 0% confident
If your OCR engine doesn't provide confidence, use 1.0

Text Ordering

Results should be ordered by reading order (top-to-bottom, left-to-right for most languages).

Example Implementations

See the /ocr directory for reference implementations:

ocr/easyocr/ - Wrapper for EasyOCR
ocr/paddleocr/ - Wrapper for PaddleOCR

The custom V2 Node package also includes a Codex SDK OCR server:

cd packages/node
node dist/cli.js codex-ocr-server \
  --host 127.0.0.1 \
  --port 8833 \
  --codex-home "$HOME/.codex-test"

It exposes:

GET /health with readiness, package version, backend sdk, model, reasoning effort, resolved codex_home, and boolean auth/config readability.
POST /ocr with the baseline LiteParse results[] response plus warning metadata.
POST /ocr/analyze with the full Codex OCR artifact.

For this fork, ~/.codex-test/auth.json and ~/.codex-test/config.toml are the live-test auth/config files. Do not copy their contents into tracked files, package artifacts, or logs.

Codex bounding boxes are model-inferred visual localization evidence. They are not deterministic layout-detector boxes, and successful responses include codex_bboxes_are_model_inferred in warning context.

Testing Your Server

Quick test:

# 1. Start your server
python server.py

# 2. Test with curl
curl -X POST http://localhost:8080/ocr \
  -F "file=@test.png" \
  -F "language=en" \
  | jq .

# 3. Expected output:
# {
#   "results": [
#     {
#       "text": "...",
#       "bbox": [x1, y1, x2, y2],
#       "confidence": 0.xx
#     }
#   ]
# }

Use with LiteParse:

lit parse document.pdf --ocr-server-url http://localhost:8080/ocr

FAQ

Q: What if my OCR returns rotated bounding boxes?

Convert to axis-aligned boxes:

def polygon_to_bbox(polygon):
    """Convert polygon [[x1,y1], [x2,y2], ...] to [x1, y1, x2, y2]"""
    xs = [point[0] for point in polygon]
    ys = [point[1] for point in polygon]
    return [min(xs), min(ys), max(xs), max(ys)]

Q: What if my OCR doesn't return confidence scores?

Just return 1.0 for all results.

Q: Can I return empty results?

Yes, return {"results": []} if no text is detected.

Q: Should I filter low-confidence results?

You can, but LiteParse will also handle filtering based on its own thresholds.

Q: What image formats should I accept?

At minimum: PNG, JPG. Optionally: TIFF, WebP, BMP, GIF.

Q: Should I handle rotation correction?

Optional. If your OCR engine supports it, you can auto-correct rotation before processing.

Q: What about multi-page documents?

LiteParse handles page splitting. Your server only needs to process single images.

Q: Performance considerations?

Keep server response time under 10 seconds per image
Support concurrent requests
Consider GPU acceleration for better performance
Cache OCR models in memory (don't reload per request)

Compliance Checklist

Support

Questions? Open an issue on GitHub or refer to the example implementations in /ocr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiteParse OCR API Specification

Overview

Endpoint

Request Format

Language Codes

Response Format

Example

Request

Response

Error Handling

Implementation Notes

Coordinate System

Bounding Box Format

Confidence Scores

Text Ordering

Example Implementations

Testing Your Server

FAQ

Q: What if my OCR returns rotated bounding boxes?

Q: What if my OCR doesn't return confidence scores?

Q: Can I return empty results?

Q: Should I filter low-confidence results?

Q: What image formats should I accept?

Q: Should I handle rotation correction?

Q: What about multi-page documents?

Q: Performance considerations?

Compliance Checklist

Support

FilesExpand file tree

OCR_API_SPEC.md

Latest commit

History

OCR_API_SPEC.md

File metadata and controls

LiteParse OCR API Specification

Overview

Endpoint

Request Format

Language Codes

Response Format

Example

Request

Response

Error Handling

Implementation Notes

Coordinate System

Bounding Box Format

Confidence Scores

Text Ordering

Example Implementations

Testing Your Server

FAQ

Q: What if my OCR returns rotated bounding boxes?

Q: What if my OCR doesn't return confidence scores?

Q: Can I return empty results?

Q: Should I filter low-confidence results?

Q: What image formats should I accept?

Q: Should I handle rotation correction?

Q: What about multi-page documents?

Q: Performance considerations?

Compliance Checklist

Support