DocFailBench v0.1 Combined Public RC is now open for external parser submissions.
DocFailBench is a failure-oriented benchmark for PDF-to-Markdown, OCR, and VLM document parsers. Instead of reporting only page-level similarity, it checks small executable facts: table cells, formulas, reading order, captions, page furniture, and optional bbox grounding.
Target release
Please use the frozen combined public RC unless you have a specific reason to target a smaller subset:
Current baseline snapshot
| Parser |
Passed |
Failed |
Score |
| Marker |
621 |
256 |
0.7081 |
| PyMuPDF bbox |
612 |
265 |
0.6978 |
| Docling |
599 |
278 |
0.6830 |
| PyMuPDF plain |
589 |
288 |
0.6716 |
| Qwen-VL API |
559 |
318 |
0.6374 |
| MinerU |
496 |
381 |
0.5656 |
| PaddleOCR |
334 |
543 |
0.3808 |
What to submit
A useful submission should include:
- parser name and version,
- installation notes or environment file,
- exact command used to generate predictions,
- prediction JSON,
- result JSON from
docfailbench.cli evaluate,
- hardware/OS/runtime metadata,
- model/API name and run date for hosted or moving-target parsers,
- optional raw Markdown outputs for failed cases.
Recommended evaluation command:
python -m docfailbench.cli evaluate `
--cases data/releases/docfailbench_v0_1_combined_public_rc_cases.json `
--predictions path/to/your_predictions.json `
--out runs/submissions/YOUR_PARSER/combined_public_rc_results.json
If you add an adapter, start from examples/parser_manifest.json and run:
python -m docfailbench.cli baseline `
--manifest examples/parser_manifest.json `
--parser your_parser `
--cases data/releases/docfailbench_v0_1_combined_public_rc_cases.json `
--out runs/submissions/your_parser/predictions.json `
--results runs/submissions/your_parser/results.json `
--html runs/submissions/your_parser/report.html
Full guide: https://github.com/Travor278/DocFailBench/blob/main/docs/submitting-parser-results.md
Review policy
Results can be listed in the README when:
- they target a frozen case file,
- predictions cover all target cases,
- parser version and run command are clear,
- no private PDFs, API keys, or proprietary raw outputs are included,
- hosted API results include endpoint family, requested model, and run date.
Maintainers may mark entries as unverified until reproduced locally.
If you maintain a PDF parser, table extractor, OCR system, or VLM document parser, please try the benchmark and post your results here or open a PR. The failures are the point: they tell us exactly which facts broke.
DocFailBench v0.1 Combined Public RC is now open for external parser submissions.
DocFailBench is a failure-oriented benchmark for PDF-to-Markdown, OCR, and VLM document parsers. Instead of reporting only page-level similarity, it checks small executable facts: table cells, formulas, reading order, captions, page furniture, and optional bbox grounding.
Target release
Please use the frozen combined public RC unless you have a specific reason to target a smaller subset:
DocFailBench-v0.1-combined-public-rcdata/releases/docfailbench_v0_1_combined_public_rc_cases.jsonCurrent baseline snapshot
What to submit
A useful submission should include:
docfailbench.cli evaluate,Recommended evaluation command:
If you add an adapter, start from
examples/parser_manifest.jsonand run:Full guide: https://github.com/Travor278/DocFailBench/blob/main/docs/submitting-parser-results.md
Review policy
Results can be listed in the README when:
Maintainers may mark entries as
unverifieduntil reproduced locally.If you maintain a PDF parser, table extractor, OCR system, or VLM document parser, please try the benchmark and post your results here or open a PR. The failures are the point: they tell us exactly which facts broke.