PCR Benchmark System

This directory is the benchmark home for PCR. It contains synthetic fixtures, expected coverage, a runner, and generated reports.

Layout

generate_synthetic_benchmark.py: regenerates benchmark fixtures under this directory.
benchmark_manifest.json: declares benchmark cases, expected tools, expected checks, and minimum/maximum finding counts.
run_benchmark.py: runs the suite and writes BENCHMARK_REPORT.md, reports/pcr.benchmark_summary.json, and reports/pcr.benchmark_summary.md.
inputs/: single-file, project, image, code, provenance, and network fixtures.
corpus/: local cross-manuscript corpus fixture.
reports/: benchmark outputs from PCR CLIs.
BENCHMARK_REPORT.md: latest human-readable benchmark summary at the benchmark root.

Run

From the repo root:

python3 benchmark/run_benchmark.py

To skip external Crossref/OpenAlex/PubPeer/NCBI calls:

python3 benchmark/run_benchmark.py --no-network

To regenerate fixtures before running:

python3 benchmark/run_benchmark.py --regenerate

Coverage

The suite covers raw data rules, including digit distribution, high-similarity rows/columns, column relationships, rare categories, and ordinal concentration; summary-stat crosscheck; R scrutiny; R statcheck; R rsprite2; p-value collection checks; reference parsing; external metadata lookup; citation claim extraction; papermill light/network signals; image duplicate/copy-move/metadata review; code scan/rerun; unsupported code recording; data trace crosscheck; provenance record/verify; and local corpus screening.

Network coverage uses inputs/project_external and expects evidence from Crossref, OpenAlex, PubPeer, and NCBI. Network failures should be interpreted separately from detector regressions because external APIs can be unavailable, rate-limited, require credentials, or return changed metadata.

Interpretation

Findings are risk signals, not misconduct verdicts. info records are operational status, dependency status, skip reasons, or coverage notes. Weak-signal capabilities such as image forensics, raw-table digit distribution/column relationships, and papermill similarity should be evaluated for surfacing review leads, not for proving a conclusion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCR Benchmark System

Layout

Run

Coverage

Interpretation

FilesExpand file tree

BENCHMARK.md

Latest commit

History

BENCHMARK.md

File metadata and controls

PCR Benchmark System

Layout

Run

Coverage

Interpretation