Call for parser submissions: DocFailBench v0.1 Combined Public RC

DocFailBench v0.1 Combined Public RC is now open for external parser submissions.

DocFailBench is a failure-oriented benchmark for PDF-to-Markdown, OCR, and VLM document parsers. Instead of reporting only page-level similarity, it checks small executable facts: table cells, formulas, reading order, captions, page furniture, and optional bbox grounding.

## Target release

Please use the frozen combined public RC unless you have a specific reason to target a smaller subset:

- Release: `DocFailBench-v0.1-combined-public-rc`
- Cases: 116
- Assertions: 877
- Cases file: `data/releases/docfailbench_v0_1_combined_public_rc_cases.json`
- HF mirror: https://huggingface.co/datasets/Travor278/DocFailBench

## Current baseline snapshot

| Parser | Passed | Failed | Score |
| --- | ---: | ---: | ---: |
| Marker | 621 | 256 | 0.7081 |
| PyMuPDF bbox | 612 | 265 | 0.6978 |
| Docling | 599 | 278 | 0.6830 |
| PyMuPDF plain | 589 | 288 | 0.6716 |
| Qwen-VL API | 559 | 318 | 0.6374 |
| MinerU | 496 | 381 | 0.5656 |
| PaddleOCR | 334 | 543 | 0.3808 |

## What to submit

A useful submission should include:

- parser name and version,
- installation notes or environment file,
- exact command used to generate predictions,
- prediction JSON,
- result JSON from `docfailbench.cli evaluate`,
- hardware/OS/runtime metadata,
- model/API name and run date for hosted or moving-target parsers,
- optional raw Markdown outputs for failed cases.

Recommended evaluation command:

```powershell
python -m docfailbench.cli evaluate `
  --cases data/releases/docfailbench_v0_1_combined_public_rc_cases.json `
  --predictions path/to/your_predictions.json `
  --out runs/submissions/YOUR_PARSER/combined_public_rc_results.json
```

If you add an adapter, start from `examples/parser_manifest.json` and run:

```powershell
python -m docfailbench.cli baseline `
  --manifest examples/parser_manifest.json `
  --parser your_parser `
  --cases data/releases/docfailbench_v0_1_combined_public_rc_cases.json `
  --out runs/submissions/your_parser/predictions.json `
  --results runs/submissions/your_parser/results.json `
  --html runs/submissions/your_parser/report.html
```

Full guide: https://github.com/Travor278/DocFailBench/blob/main/docs/submitting-parser-results.md

## Review policy

Results can be listed in the README when:

- they target a frozen case file,
- predictions cover all target cases,
- parser version and run command are clear,
- no private PDFs, API keys, or proprietary raw outputs are included,
- hosted API results include endpoint family, requested model, and run date.

Maintainers may mark entries as `unverified` until reproduced locally.

If you maintain a PDF parser, table extractor, OCR system, or VLM document parser, please try the benchmark and post your results here or open a PR. The failures are the point: they tell us exactly which facts broke.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call for parser submissions: DocFailBench v0.1 Combined Public RC #1

Target release

Current baseline snapshot

What to submit

Review policy

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Parser	Passed	Failed	Score
Marker	621	256	0.7081
PyMuPDF bbox	612	265	0.6978
Docling	599	278	0.6830
PyMuPDF plain	589	288	0.6716
Qwen-VL API	559	318	0.6374
MinerU	496	381	0.5656
PaddleOCR	334	543	0.3808

Call for parser submissions: DocFailBench v0.1 Combined Public RC #1

Description

Target release

Current baseline snapshot

What to submit

Review policy

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions