DocFailBench v0.1 Combined Public RC

DocFailBench is a failure-oriented benchmark for PDF-to-Markdown, OCR, and VLM document parsers.

Instead of asking whether a parsed page looks roughly similar, this release checks small, auditable facts: table cells, formulas, reading order, captions, page furniture, and optional bbox grounding.

Frozen target

Release: DocFailBench-v0.1-combined-public-rc
Cases: 116
Assertions: 877
Cached parser baselines: 7
Recommended cases file: data/releases/docfailbench_v0_1_combined_public_rc_cases.json

Baseline snapshot

Parser	Passed	Failed	Score
Marker	621	256	0.7081
PyMuPDF bbox	612	265	0.6978
Docling	599	278	0.6830
PyMuPDF plain	589	288	0.6716
Qwen-VL API	559	318	0.6374
MinerU	496	381	0.5656
PaddleOCR	334	543	0.3808

Verify cached scores

powershell -ExecutionPolicy Bypass -File scripts\run_combined_public_compare.ps1

Submit a parser

Open an issue or PR with parser version, exact command, prediction JSON, result JSON, and runtime metadata. See docs/submitting-parser-results.md.

Source PDFs are not bundled in git; use the source manifests and fetch/document URLs for reproducibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.