Skip to content

Releases: Travor278/DocFailBench

DocFailBench v0.1 Combined Public RC

09 May 10:43

Choose a tag to compare

Pre-release

DocFailBench v0.1 Combined Public RC

DocFailBench is a failure-oriented benchmark for PDF-to-Markdown, OCR, and VLM document parsers.

Instead of asking whether a parsed page looks roughly similar, this release checks small, auditable facts: table cells, formulas, reading order, captions, page furniture, and optional bbox grounding.

Frozen target

  • Release: DocFailBench-v0.1-combined-public-rc
  • Cases: 116
  • Assertions: 877
  • Cached parser baselines: 7
  • Recommended cases file: data/releases/docfailbench_v0_1_combined_public_rc_cases.json

Baseline snapshot

Parser Passed Failed Score
Marker 621 256 0.7081
PyMuPDF bbox 612 265 0.6978
Docling 599 278 0.6830
PyMuPDF plain 589 288 0.6716
Qwen-VL API 559 318 0.6374
MinerU 496 381 0.5656
PaddleOCR 334 543 0.3808

Verify cached scores

powershell -ExecutionPolicy Bypass -File scripts\run_combined_public_compare.ps1

Submit a parser

Open an issue or PR with parser version, exact command, prediction JSON, result JSON, and runtime metadata. See docs/submitting-parser-results.md.

Source PDFs are not bundled in git; use the source manifests and fetch/document URLs for reproducibility.