EasyOcr-ggml

A GGML/GGUF port of the EasyOCR inference pipeline. The goal is a self-contained native binary (no Python, no PyTorch, no ONNX Runtime) that loads .gguf weights and produces the same OCR results as the upstream Python library.

This repo is the GGML-backed sibling of @qvac/ocr-onnx — same pipeline shape, same pre/post-processing, different inference engine.

Current milestone scope is gen-2 recognizers only (English/Latin path).

Status

See docs/PLAN.md for the detailed roadmap. Architecture deep-dive: docs/architecture.md.

PoC benchmark snapshot

End-to-end CPU latency on examples/english.png (same host, warmup=1, runs=5):

Benchmark	Mean (ms)	p50 (ms)	p95 (ms)	Notes
`easyocr-ggml` (`test_ocr_pipeline`)	23162.42	22999.27	23772.36	Full pipeline: CRAFT + box + CRNN
EasyOCR Python (`Reader.readtext`, CPU)	5510.45	5468.99	5919.81	Same image and `mag_ratio=1.5`, `add_margin=0.0`, `paragraph=False`

Stage split estimate for easyocr-ggml (same run profile):

Segment	Mean (ms)	Share
Detection side (`detect` ~= CRAFT + box post-proc)	19591.69	80.1%
Recognition side (residual from full - detect)	4870.27	19.9%

Notes:

This is a PoC benchmark snapshot, not a release SLA.
Stage split is derived from separate binary runs; it is directionally useful for bottleneck targeting.

Repository layout

EasyOcr-ggml/
├── README.md            this file
├── CMakeLists.txt       top-level build (ggml submodule + OpenCV + targets)
├── docs/
│   ├── PLAN.md          detailed port plan and design notes
│   └── architecture.md  layered architecture, decisions, tech debt
├── scripts/
│   └── pth_to_gguf.py   PyTorch .pth → GGUF weight converter
├── models/
│   ├── craft_mlt_25k.gguf    detector weights + metadata
│   └── english_g2.gguf       English recognizer weights + vocab metadata
├── include/easyocr-ggml/
│   ├── gguf_loader.hpp       public GGUF loader API
│   ├── craft_weights.hpp     CRAFT weight loader + BN-fold
│   ├── craft.hpp             build_craft() and tap names
│   ├── crnn_weights.hpp      CRNN gen-2 weight loader (Phase 4)
│   ├── crnn.hpp              build_crnn_gen2() and tap names (Phase 4)
│   ├── ops.hpp               reusable conv / bilinear ops
│   └── pipeline/             Phase 3: lifted from @qvac/ocr-onnx
│       ├── steps.hpp           shared types (PipelineContext, …)
│       ├── step_detection_inference.hpp  pre-proc + GGML inference
│       └── step_bounding_box.hpp         post-proc (heatmap → polygons)
├── src/
│   ├── ggml/
│   │   ├── gguf_loader.cpp   RAII wrapper over gguf_init_from_file
│   │   ├── craft_weights.cpp 154 tensors + BN-fold
│   │   ├── ops.cpp           conv_2d_bias / _relu, bilinear_to
│   │   ├── craft.cpp         CRAFT compute graph
│   │   ├── crnn_weights.cpp  CRNN gen-2 weights (BN-fold + verbatim copy)
│   │   └── crnn.cpp          CRNN gen-2 graph + manual BiLSTM cell
│   ├── pipeline/             Phase 3 implementations
│   │   ├── steps.cpp           fourPointTransform, InferredText::toString
│   │   ├── step_detection_inference.cpp
│   │   ├── step_bounding_box.cpp           lifted verbatim (535 LOC)
│   │   └── qlog.hpp                        QLOG/ALOG_DEBUG no-op shim
│   └── cli/
│       ├── smoke.cpp         GGUF metadata smoke (Phase 1)
│       ├── craft_smoke.cpp   CRAFT graph smoke (Phase 2)
│       ├── detect.cpp        end-to-end detection (Phase 3)
│       └── crnn_smoke.cpp    CRNN gen-2 graph smoke (Phase 4)
├── examples/
│   └── english.png          canonical real-world OCR test image
├── tests/
│   ├── test_build_craft.cpp  oracle vs PyTorch references
│   └── reference/
│       ├── dump_craft_reference.py        PyTorch oracle dumper
│       ├── craft_input.npy                synthetic ramp input
│       ├── craft_output_nhwc.npy          synthetic expected output
│       ├── craft_real_english_input.npy   pre-processed english.png
│       └── craft_real_english_output_nhwc.npy   expected heatmap
├── third_party/
│   └── ggml/                 git submodule, pinned commit
└── (future: more under src/ggml/, build_crnn_gen{1,2})

Quick start (today — weight conversion only)

The conversion script depends on torch, gguf, and easyocr. Easiest is to reuse the venv from the upstream EasyOCR clone:

# from this directory
../EasyOCR/.venv/bin/python scripts/pth_to_gguf.py \
    ~/.EasyOCR/model/<model>.pth \
    models/<model>.gguf

The script auto-detects the architecture from the filename:

Input filename pattern	`general.architecture`	Extra metadata
`craft_mlt_25k.pth`	`craft`	(none)
`pretrained_ic15_res*.pt`	`dbnet`	(none)
`english_g2.pth`, `*_g2.pth`	`crnn`	`crnn.generation=2` + vocab

For custom checkpoints not in easyocr.config, pass --arch explicitly.

Quantized conversion (Phase 9)

scripts/pth_to_gguf.py now supports:

--quantize Q8_0
--quantize Q4_K

Example:

../EasyOCR/.venv/bin/python scripts/pth_to_gguf.py \
  ~/.EasyOCR/model/english_g2.pth \
  models/english_g2_q8_0.gguf \
  --quantize Q8_0

Batch conversion used for PoC benchmarking:

../EasyOCR/.venv/bin/python scripts/pth_to_gguf.py \
  ~/.EasyOCR/model/craft_mlt_25k.pth models/craft_mlt_25k_q8_0.gguf --quantize Q8_0
../EasyOCR/.venv/bin/python scripts/pth_to_gguf.py \
  ~/.EasyOCR/model/english_g2.pth models/english_g2_q8_0.gguf --quantize Q8_0
../EasyOCR/.venv/bin/python scripts/pth_to_gguf.py \
  ~/.EasyOCR/model/craft_mlt_25k.pth models/craft_mlt_25k_q4_k.gguf --quantize Q4_K
../EasyOCR/.venv/bin/python scripts/pth_to_gguf.py \
  ~/.EasyOCR/model/english_g2.pth models/english_g2_q4_k.gguf --quantize Q4_K

Build (native, Linux x64)

The native build links ggml (vendored as a submodule) and OpenCV (system). It produces these binaries in build/:

smoke — opens a .gguf file and prints its architecture / tensor count. Validates the loader, the ggml + OpenCV link, and the converted weights.
craft_smoke — runs the CRAFT compute graph end-to-end on a synthetic input and prints the output shape + simple stats.
test_build_craft — compares the GGML graph output against PyTorch reference dumps (synthetic ramp by default; --image english for a real image). Pass or fail at atol=1e-4.
detect (Phase 3) — full pipeline: imread → resize + ImageNet normalize → build_craft → connected-components / box merge → prints aligned + unaligned text-box polygons. Optional --debug-png debug.png overlays the boxes on the source image.
test_detect_polygons (Phase 3) — runs detect on examples/english.png and compares against EasyOCR Python's polygons (committed at tests/reference/craft_real_english_polygons.json). Hooked into ctest.
crnn_smoke (Phase 4) — runs the CRNN gen-2 compute graph on a synthetic input and prints the logits' shape + min/max/mean.
test_build_crnn_gen2 (Phase 4) — feeds the same np.linspace-ramp through both build_crnn_gen2 and PyTorch, and compares the final [1, T, num_classes] logits at atol=1e-4. Hooked into ctest.
ocr-cli (Phase 5 + 7) — full end-to-end OCR: imread → detect → box → crop → recognize → print recognized text (gen-2 recognizers). Flags: --detail 0|1, --output-format standard|json, --lang en[,fr,...], --paragraph, --mag-ratio 1.5, --debug-png debug.png.
test_ocr_pipeline (Phase 5) — runs the pipeline on examples/english.png and compares recognized text against EasyOCR Python's readtext; now also reports CER/WER and optional latency stats. Hooked into ctest.

One-time setup

# 1. Fetch the ggml submodule (pinned commit recorded in .gitmodules)
git submodule update --init --recursive

# 2. Install OpenCV headers + libs (Ubuntu / Debian)
sudo apt update
sudo apt install -y libopencv-dev cmake build-essential

# 3. Configure & build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

Run the smoke binaries

# GGUF metadata smoke
./build/smoke models/craft_mlt_25k.gguf models/english_g2.gguf
# [ok] models/craft_mlt_25k.gguf  arch=craft  n_tensors=154  n_kv=12
# [ok] models/english_g2.gguf  arch=crnn  n_tensors=44  n_kv=18  vocab_bytes=98  num_classes=97

# CRAFT graph smoke at side=256
./build/craft_smoke models/craft_mlt_25k.gguf 256
# [ok] output ne = [2, 128, 128, 1]  n_elements=32768  ...

Run the CRAFT oracle test

# (a) Synthetic ramp at 64×64 (this is what ctest runs in CI)
./build/test_build_craft
# [ ok ] output_nhwc          n=2048    max_abs=4.68e-08  ...
# 1 passed, 0 failed, 12 skipped  (atol=1e-04)

# (b) Real image: examples/english.png
./build/test_build_craft --image english
# [input] real image english  NCHW=[1,3,736,1376]
# [ ok ] output_nhwc          n=506368  max_abs=5.36e-07  ...
# 1 passed, 0 failed, 12 skipped  (atol=1e-04)

Run Phase 8 evaluations

# One command: build + quality/latency evaluation + JSON reports.
./scripts/eval_phase8.sh
# Reports:
#   out/phase8/detect_metrics.json
#   out/phase8/ocr_metrics.json

# Or run tests directly:
./build/test_detect_polygons --report-json /tmp/detect_metrics.json
./build/test_ocr_pipeline \
  --warmup-runs 1 --bench-runs 5 \
  --report-json /tmp/ocr_metrics.json

Run full OCR end-to-end (Phase 5 + 7)

# Default: detect + recognize on examples/english.png using English weights
./build/ocr-cli
# Reduce your risk of coronavirus infection:
# Clean hands with soap and water
# ...

# Detail mode: index + confidence + bounding box per line
./build/ocr-cli --detail 1

# JSON output (matches EasyOCR Python's readtext shape)
./build/ocr-cli --output-format json | jq .

# With annotated debug image
./build/ocr-cli --image examples/english.png \
                --debug-png /tmp/english_ocr.png

# Different gen-2 recognizer:
./build/ocr-cli --recognizer models/latin_g2.gguf --image my_french.jpg

# Q8_0 quantized recognizer + detector:
./build/ocr-cli \
  --detector models/craft_mlt_25k_q8_0.gguf \
  --recognizer models/english_g2_q8_0.gguf \
  --image examples/english.png

# Text-vs-EasyOCR test (9/12 exact, all within edit distance 3):
./build/test_ocr_pipeline
# 9/12 (75%) exact, worst_edit=3, CER/WER reported  PASS

# Or via ctest:
cmake --build build --target test
# 4/4 tests passed

Run the end-to-end detection pipeline (Phase 3)

# Pretty-print the boxes detected on examples/english.png
./build/detect
# [load]   image examples/english.png  905x480x3
# [infer]  textMap=688x368  linkMap=688x368  imgResizeRatio=1.3333
# [boxes]  aligned=12  unaligned=0

# Save an annotated PNG with green boxes over the source image:
./build/detect --image examples/english.png \
               --debug-png /tmp/english_boxes.png

# Polygon-vs-EasyOCR test:
./build/test_detect_polygons
# 12/12 box count match, 11/12 within 3 px (PASS)

# Or via ctest:
cmake --build build --target test

Run the CRNN gen-2 recognizer graph (Phase 4)

# Smoke: synthetic input through the recognizer graph
./build/crnn_smoke models/english_g2.gguf 256
# [run]   computing graph (input 256x64, 4910 nodes)...
# [ok]    logits ne = [97, 63, 1, 1]   (== PyTorch [1, T=63, num_classes=97])

# Logits-vs-PyTorch oracle test (regenerate references first if needed):
../EasyOCR/.venv/bin/python tests/reference/dump_crnn_reference.py
./build/test_build_crnn_gen2
# [ ok ] logits  n=6111  max_abs=7.6e-06  ...
# PASS  (atol=1e-04)

The real-image test (b) needs no Python or extra setup — the pre-processed input (tests/reference/craft_real_english_input.npy, 12 MB) and expected heatmap (craft_real_english_output_nhwc.npy, 2 MB) are committed alongside the source PNG (examples/english.png). The 12 "skip" lines are intentional: only the input + final output are committed; per-layer dumps regenerate locally — see Diagnosing which layer first diverges for the bisect workflow.

Note on vocab_bytes / num_classes in the GGUF smoke output: vocab_bytes is the UTF-8 byte length of crnn.vocab; the character count is num_classes − 1 (the −1 accounts for the CTC blank token), which is 96 for the bundled English gen-2 vocab. The two disagree by 2 because the € symbol takes 3 UTF-8 bytes.

CRAFT detector graph (Phase 2)

build_craft lives in src/ggml/craft.cpp and mirrors easyocr/craft.py exactly. All BatchNorm parameters are pre-folded into the preceding Conv2d at load time inside CraftWeights, so the runtime graph contains no BN op.

The end-to-end correctness test (./build/test_build_craft) is documented under Build above. This section covers the two extra workflows: regenerating the references and bisecting a regression.

Reference dumps

Mode	Input	Committed dumps	Regen command
Synthetic	`np.linspace(-1, 1)` ramp at 64×64	`craft_input.npy`, `craft_output_nhwc.npy` (~60 KB)	`dump_craft_reference.py`
Real image	`examples/english.png` via EasyOCR's `imgproc` (NCHW `[1, 3, 736, 1376]`)	`craft_real_english_input.npy`, `craft_real_english_output_nhwc.npy` (~14 MB)	`dump_craft_reference.py --image examples/english.png`

Both pass at atol=1e-4; observed errors are at the FP32-noise floor (synthetic max 4.7e-08, real-image max 5.36e-07). Synthetic regenerates in <1 s; real-image runs the upstream PyTorch model once (~5 s on CPU) and is rarely needed unless you change the dumper.

# Regenerate the committed (minimal) references:
../EasyOCR/.venv/bin/python tests/reference/dump_craft_reference.py
../EasyOCR/.venv/bin/python tests/reference/dump_craft_reference.py \
    --image examples/english.png

# Run synthetic test via ctest:
cmake --build build --target test

Diagnosing which layer first diverges

If test_build_craft reports a regression, regenerate every U-net stage locally and re-run — the comparator reports max_abs per tap and pinpoints where things drifted:

../EasyOCR/.venv/bin/python tests/reference/dump_craft_reference.py --per-layer
../EasyOCR/.venv/bin/python tests/reference/dump_craft_reference.py \
    --image examples/english.png --per-layer

./build/test_build_craft               # 13 passed when the graph is healthy
./build/test_build_craft --image english

The per-layer dumps are sizable (~1.8 MB synthetic, ~360 MB real) and are git-ignored — they live only on the developer's machine until the next regeneration.

Try a different real image

The committed examples/english.png is the canonical test, but the workflow generalises:

# Drop a new image (PNG / JPG, RGB or grayscale)
cp /path/to/your/image.png examples/myimage.png

# Generate the pre-processed input + expected heatmap
../EasyOCR/.venv/bin/python tests/reference/dump_craft_reference.py \
    --image examples/myimage.png

# Run the test against it (note: --image takes the *stem*, not the path)
./build/test_build_craft --image myimage

tests/reference/craft_real_<stem>_*.npy are git-ignored by default unless you choose to commit them. EasyOCR's own pre-processing (imgproc.resize_aspect_ratio with mag_ratio=1.5, canvas_size=2560) is applied — pass --mag-ratio / --canvas-size to the dumper to vary those.

Phase 3 will lift @qvac/ocr-onnx's resizeAspectRatio + normalizeAndBuildCHW C++ pre-processing into the runtime, with these real-image references serving as the bit-exact ground truth.

Inspecting a converted GGUF

../EasyOCR/.venv/bin/python -m gguf.scripts.gguf_dump models/english_g2.gguf | head -30

You should see metadata KVs including general.architecture = crnn, crnn.generation = 2, crnn.num_classes = 97, and crnn.vocab containing the 96-character set used by the gen-2 English CTC head.

Why GGML?

The two QVAC OCR siblings differ only in the inference engine:

	`@qvac/ocr-onnx`	`EasyOcr-ggml` (this)
Inference backend	ONNX Runtime	GGML
Weight format	`.onnx`	`.gguf`
Pre/post-processing	C++ + OpenCV	C++ + OpenCV (same code)
Quantization	per-EP (limited)	block-quantized (Q8_0, Q4_K, …) out of the box
Binary size	ONNX Runtime ~30 MB+	libggml ~1–3 MB
Mobile / edge fit	good	smaller, faster cold start, no EP plumbing

Quantization trade-offs (PoC snapshot)

Measured on examples/english.png, CPU, warmup=1, runs=3 using test_ocr_pipeline.

Variant	Detector GGUF	Recognizer GGUF	Total model size	OCR text parity vs F32	Mean latency (ms)
F32 baseline	80M	15M	95M	baseline	23734.13
Q8_0	80M	7.8M	87.8M	identical (12/12 lines)	23930.10
Q4_K	80M	15M	95M	identical (12/12 lines)	24074.04

Notes:

Current CRAFT tensor layout does not hit block-quantization constraints in a way that reduces file size with this converter path, so detector size stays ~80M.
Q4_K via current gguf Python quantizer falls back to F32 for this model family (no size win).
With the current bottleneck split (~80% detection side), quantizing only recognizer weights does not yet improve end-to-end latency.

License

Apache-2.0 (matches upstream EasyOCR and @qvac/ocr-onnx).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EasyOcr-ggml

Status

PoC benchmark snapshot

Repository layout

Quick start (today — weight conversion only)

Quantized conversion (Phase 9)

Build (native, Linux x64)

One-time setup

Run the smoke binaries

Run the CRAFT oracle test

Run Phase 8 evaluations

Run full OCR end-to-end (Phase 5 + 7)

Run the end-to-end detection pipeline (Phase 3)

Run the CRNN gen-2 recognizer graph (Phase 4)

CRAFT detector graph (Phase 2)

Reference dumps

Diagnosing which layer first diverges

Try a different real image

Inspecting a converted GGUF

Why GGML?

Quantization trade-offs (PoC snapshot)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
examples		examples
include/easyocr-ggml		include/easyocr-ggml
models		models
scripts		scripts
src		src
tests		tests
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

EasyOcr-ggml

Status

PoC benchmark snapshot

Repository layout

Quick start (today — weight conversion only)

Quantized conversion (Phase 9)

Build (native, Linux x64)

One-time setup

Run the smoke binaries

Run the CRAFT oracle test

Run Phase 8 evaluations

Run full OCR end-to-end (Phase 5 + 7)

Run the end-to-end detection pipeline (Phase 3)

Run the CRNN gen-2 recognizer graph (Phase 4)

CRAFT detector graph (Phase 2)

Reference dumps

Diagnosing which layer first diverges

Try a different real image

Inspecting a converted GGUF

Why GGML?

Quantization trade-offs (PoC snapshot)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages