Skip to content

Commit 25d3011

Browse files
committed
Release 1.2.0 audit enhancements
1 parent 72d96ed commit 25d3011

36 files changed

Lines changed: 1284 additions & 214 deletions

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Changelog
22

3+
## v1.2.0
4+
5+
- Added public report export support with schema metadata and CLI coverage.
6+
- Expanded project audit checks for references, image signals, routing, raw data, and summary-stat crosschecks.
7+
- Improved benchmark fixtures, benchmark reports, and documentation for the updated audit behavior.
8+
- Added development tooling configuration for pytest, ruff, mypy, and type stubs.
9+
310
## v1.1.0
411

512
- Renamed the project to `pre-check-research` and standardized the `pcr` CLI/output prefix.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ The report is designed to support a defensible review process. It does not repla
7575
| **P-values** | `p_value_collection` | Domain validity (p outside [0,1]), just-significant clustering |
7676
| **Statistical text** | `statcheck` | APA/NHST in-text statistic vs reported p-value consistency |
7777
| **Images** | `image_audit` | Internal duplicates (aHash/dHash/pHash), rotated/flipped copies, copy-move triage, western blot/gel review |
78-
| **References** | `reference_audit` | DOI/PMID parsing, Crossref/OpenAlex/NCBI metadata queries, citation claim extraction |
78+
| **References** | `reference_audit` | DOI/PMID parsing, Crossref/OpenAlex/PubPeer/NCBI metadata queries, citation claim extraction |
7979
| **Code** | `code_audit`, sandbox | Pattern scanning (hardcoded paths, exclusion clues), Python/R script rerun with output capture |
8080
| **Corpus** | `corpus_signals` | Cross-manuscript text similarity (simhash, Jaccard), reference overlap, papermill phrase signals |
8181
| **Provenance** | `provenance` | SHA-256 file hashing, append-only JSONL ledger, verify/diff change detection |
@@ -283,7 +283,7 @@ Built-in example projects for testing and demonstration:
283283
## Privacy and Security
284284

285285
- **All computation is local.** No data is uploaded to external services.
286-
- External lookups (Crossref, OpenAlex, NCBI) only query public identifiers (DOI, PMID) and can be disabled with `--no-external-lookups`.
286+
- External lookups (Crossref, OpenAlex, PubPeer, NCBI) only query public identifiers (DOI, PMID) and can be disabled with `--no-external-lookups`.
287287
- Code reruns execute in temporary project copies with timeouts and minimal environment variables. This is not a strong security sandbox — treat unknown code accordingly.
288288
- SHA-256 provenance ledgers are append-only and never transmit file contents.
289289

benchmark/BENCHMARK.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ From the repo root:
2020
python3 benchmark/run_benchmark.py
2121
```
2222

23-
To skip external Crossref/OpenAlex/NCBI calls:
23+
To skip external Crossref/OpenAlex/PubPeer/NCBI calls:
2424

2525
```bash
2626
python3 benchmark/run_benchmark.py --no-network
@@ -36,7 +36,7 @@ python3 benchmark/run_benchmark.py --regenerate
3636

3737
The suite covers raw data rules, including digit distribution, high-similarity rows/columns, column relationships, rare categories, and ordinal concentration; summary-stat crosscheck; R scrutiny; R statcheck; R rsprite2; p-value collection checks; reference parsing; external metadata lookup; citation claim extraction; papermill light/network signals; image duplicate/copy-move/metadata review; code scan/rerun; unsupported code recording; data trace crosscheck; provenance record/verify; and local corpus screening.
3838

39-
Network coverage uses `inputs/project_external` and expects evidence from Crossref, OpenAlex, and NCBI. Network failures should be interpreted separately from detector regressions because external APIs can be unavailable, rate-limited, or return changed metadata.
39+
Network coverage uses `inputs/project_external` and expects evidence from Crossref, OpenAlex, PubPeer, and NCBI. Network failures should be interpreted separately from detector regressions because external APIs can be unavailable, rate-limited, require credentials, or return changed metadata.
4040

4141
## Interpretation
4242

benchmark/BENCHMARK_REPORT.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Conclusion: The core detection pipeline is stably covered by automated benchmark
1616
- Raw data: Covers duplicate/highly similar rows and columns, fixed steps, high-frequency values, missing-concentrated-by-group, terminal digit distribution, inter-column relationships, and non-continuous variable anomalies; clean controls maintain 0 risk signals.
1717
- Summary statistics: Covers SE/SD/N, CI, percent/count, p/t/df, p-value domain, and R scrutiny/SPRITE feasibility checks.
1818
- In-text statistics: Covers R statcheck p-value consistency checks on APA/NHST expressions.
19-
- Literature & network: Covers DOI/PMID parsing, Crossref/OpenAlex/NCBI metadata queries, and citation claim extraction.
19+
- Literature & network: Covers DOI/PMID parsing, Crossref/OpenAlex/PubPeer/NCBI metadata queries, and citation claim extraction.
2020
- Images: Covers image discovery, internal duplicates, local copy-move, metadata quality, and Western blot/gel review checklist.
2121
- Code & project: Covers Python/R script reruns, Stata/SPSS/SAS read-only prompts, cross-material data reconciliation, project manifest, provenance version chain, and local corpus screening.
2222

@@ -36,18 +36,18 @@ Not executed (--no-network used).
3636

3737
| Case | Type | Pass | Seconds | Risk Signals | Info | Missing Tools | Missing Checks |
3838
|---|---:|---:|---:|---:|---|---|
39-
| raw_suspicious | single_run | Yes | 1.16 | 16 | 0 | | |
40-
| raw_clean_control | single_run | Yes | 1.046 | 0 | 0 | | |
41-
| summary_suspicious | single_run | Yes | 2.083 | 17 | 2 | | |
42-
| p_values_suspicious | single_run | Yes | 1.012 | 2 | 0 | | |
43-
| apa_stats_suspicious | single_run | Yes | 2.193 | 2 | 0 | | |
44-
| paper_refs_and_claims_offline | single_run | Yes | 1.011 | 0 | 4 | | |
45-
| analysis_suspicious | single_run | Yes | 1.378 | 1 | 1 | | |
46-
| analysis_manual_unsupported | single_run | Yes | 1.012 | 0 | 3 | | |
47-
| figures_project | project | Yes | 1.205 | 11 | 13 | | |
48-
| project_full | project | Yes | 2.555 | 12 | 19 | | |
49-
| corpus_screen | corpus | Yes | 2.355 | 4 | 0 | | |
50-
| provenance_change | provenance_change | Yes | 2.067 | 1 | 5 | | |
39+
| raw_suspicious | single_run | Yes | 1.284 | 16 | 0 | | |
40+
| raw_clean_control | single_run | Yes | 1.147 | 0 | 0 | | |
41+
| summary_suspicious | single_run | Yes | 2.279 | 17 | 2 | | |
42+
| p_values_suspicious | single_run | Yes | 1.039 | 2 | 0 | | |
43+
| apa_stats_suspicious | single_run | Yes | 2.156 | 2 | 0 | | |
44+
| paper_refs_and_claims_offline | single_run | Yes | 1.072 | 0 | 4 | | |
45+
| analysis_suspicious | single_run | Yes | 1.42 | 1 | 1 | | |
46+
| analysis_manual_unsupported | single_run | Yes | 1.054 | 0 | 3 | | |
47+
| figures_project | project | Yes | 1.201 | 11 | 13 | | |
48+
| project_full | project | Yes | 2.471 | 12 | 19 | | |
49+
| corpus_screen | corpus | Yes | 2.104 | 4 | 0 | | |
50+
| provenance_change | provenance_change | Yes | 2.072 | 1 | 5 | | |
5151
| external_refs_online | project_network | Yes | 0.0 | 0 | 0 | | |
5252

5353
## Tool Coverage
@@ -93,5 +93,5 @@ Not executed (--no-network used).
9393
## Interpretation Boundaries
9494

9595
The high/medium/low levels in this report are benchmark risk signals, not conclusions of academic misconduct, fabrication, or fraud. `info` records are run statuses, dependency states, skip reasons, or coverage notes; they do not count toward risk conclusions.
96-
Network test cases depend on real-time availability, certificate chains, and rate limiting of Crossref, OpenAlex, and NCBI. If network cases fail, first check HTTP/SSL/rate-limit information in evidence before concluding it is a detector regression.
96+
Network test cases depend on real-time availability, certificate chains, credentials, and rate limiting of Crossref, OpenAlex, PubPeer, and NCBI. If network cases fail, first check HTTP/SSL/rate-limit information in evidence before concluding it is a detector regression.
9797
All weak-signal tools are only for surfacing human review directions. Final review should return to original data, scripts, image source files, literature metadata, and audit logs.

benchmark/benchmark_manifest.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,8 +110,8 @@
110110
"kind": "project_network",
111111
"input": "inputs/project_external",
112112
"expected_tools": ["reference_audit", "citation_claim_check", "provenance_hash"],
113-
"expected_checks": ["DOI title mismatch", "PMID title mismatch", "DOI external metadata unverifiable", "PMID metadata verification"],
114-
"expected_external_services": ["crossref", "openalex", "ncbi"],
113+
"expected_checks": ["DOI title mismatch", "PMID title mismatch", "DOI external metadata absent", "PMID metadata verification"],
114+
"expected_external_services": ["crossref", "openalex", "pubpeer", "ncbi"],
115115
"min_risk_findings": 3
116116
}
117117
]

benchmark/generate_synthetic_benchmark.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
from __future__ import annotations
22

33
import json
4-
import math
54
import shutil
65
from pathlib import Path
76

@@ -283,7 +282,7 @@ def write_ground_truth() -> None:
283282
"known_limitations": [
284283
"R tool output depends on statcheck/scrutiny/rsprite2 parsing of column names and text formats.",
285284
"Image copy-move is a weak signal; low-texture or regularly repeating graphics may produce false positives/negatives.",
286-
"Project-level external reference queries are disabled in this benchmark; Crossref/OpenAlex/NCBI network reliability is not tested.",
285+
"Project-level external reference queries are disabled in this benchmark; Crossref/OpenAlex/PubPeer/NCBI network reliability is not tested.",
287286
],
288287
},
289288
ensure_ascii=False,

benchmark/ground_truth.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,6 @@
2323
"known_limitations": [
2424
"R tool output depends on statcheck/scrutiny/rsprite2 parsing of column names and text formats.",
2525
"Image copy-move is a weak signal; low-texture or regularly repeating graphics may produce false positives/negatives.",
26-
"Project-level external reference queries are disabled in this benchmark; Crossref/OpenAlex/NCBI network reliability is not tested."
26+
"Project-level external reference queries are disabled in this benchmark; Crossref/OpenAlex/PubPeer/NCBI network reliability is not tested."
2727
]
28-
}
28+
}

benchmark/reports/pcr.benchmark_summary.json

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"kind": "single_run",
88
"ok": true,
99
"returncode": 0,
10-
"seconds": 1.16,
10+
"seconds": 1.284,
1111
"json_path": "benchmark/reports/pcr.raw_suspicious.json",
1212
"markdown_path": "benchmark/reports/pcr.raw_suspicious.md",
1313
"risk_findings": 16,
@@ -38,7 +38,7 @@
3838
"kind": "single_run",
3939
"ok": true,
4040
"returncode": 0,
41-
"seconds": 1.046,
41+
"seconds": 1.147,
4242
"json_path": "benchmark/reports/pcr.raw_clean_control.json",
4343
"markdown_path": "benchmark/reports/pcr.raw_clean_control.md",
4444
"risk_findings": 0,
@@ -56,7 +56,7 @@
5656
"kind": "single_run",
5757
"ok": true,
5858
"returncode": 0,
59-
"seconds": 2.083,
59+
"seconds": 2.279,
6060
"json_path": "benchmark/reports/pcr.summary_suspicious.json",
6161
"markdown_path": "benchmark/reports/pcr.summary_suspicious.md",
6262
"risk_findings": 17,
@@ -90,7 +90,7 @@
9090
"kind": "single_run",
9191
"ok": true,
9292
"returncode": 0,
93-
"seconds": 1.012,
93+
"seconds": 1.039,
9494
"json_path": "benchmark/reports/pcr.p_values_suspicious.json",
9595
"markdown_path": "benchmark/reports/pcr.p_values_suspicious.md",
9696
"risk_findings": 2,
@@ -113,7 +113,7 @@
113113
"kind": "single_run",
114114
"ok": true,
115115
"returncode": 0,
116-
"seconds": 2.193,
116+
"seconds": 2.156,
117117
"json_path": "benchmark/reports/pcr.apa_stats_suspicious.json",
118118
"markdown_path": "benchmark/reports/pcr.apa_stats_suspicious.md",
119119
"risk_findings": 2,
@@ -135,7 +135,7 @@
135135
"kind": "single_run",
136136
"ok": true,
137137
"returncode": 0,
138-
"seconds": 1.011,
138+
"seconds": 1.072,
139139
"json_path": "benchmark/reports/pcr.paper_refs_and_claims_offline.json",
140140
"markdown_path": "benchmark/reports/pcr.paper_refs_and_claims_offline.md",
141141
"risk_findings": 0,
@@ -162,7 +162,7 @@
162162
"kind": "single_run",
163163
"ok": true,
164164
"returncode": 0,
165-
"seconds": 1.378,
165+
"seconds": 1.42,
166166
"json_path": "benchmark/reports/pcr.analysis_suspicious.json",
167167
"markdown_path": "benchmark/reports/pcr.analysis_suspicious.md",
168168
"risk_findings": 1,
@@ -186,7 +186,7 @@
186186
"kind": "single_run",
187187
"ok": true,
188188
"returncode": 0,
189-
"seconds": 1.012,
189+
"seconds": 1.054,
190190
"json_path": "benchmark/reports/pcr.analysis_manual_unsupported.json",
191191
"markdown_path": "benchmark/reports/pcr.analysis_manual_unsupported.md",
192192
"risk_findings": 0,
@@ -211,7 +211,7 @@
211211
"kind": "project",
212212
"ok": true,
213213
"returncode": 0,
214-
"seconds": 1.205,
214+
"seconds": 1.201,
215215
"json_path": "benchmark/reports/pcr.figures_project.json",
216216
"markdown_path": "benchmark/reports/pcr.figures_project.md",
217217
"risk_findings": 11,
@@ -255,7 +255,7 @@
255255
"kind": "project",
256256
"ok": true,
257257
"returncode": 0,
258-
"seconds": 2.555,
258+
"seconds": 2.471,
259259
"json_path": "benchmark/reports/pcr.project_full.json",
260260
"markdown_path": "benchmark/reports/pcr.project_full.md",
261261
"risk_findings": 12,
@@ -308,7 +308,7 @@
308308
"kind": "corpus",
309309
"ok": true,
310310
"returncode": 0,
311-
"seconds": 2.355,
311+
"seconds": 2.104,
312312
"json_path": "benchmark/reports/pcr.corpus_screen.json",
313313
"markdown_path": "benchmark/reports/pcr.corpus_screen.md",
314314
"risk_findings": 4,
@@ -332,7 +332,7 @@
332332
"kind": "provenance_change",
333333
"ok": true,
334334
"returncode": 0,
335-
"seconds": 2.067,
335+
"seconds": 2.072,
336336
"json_path": "benchmark/reports/pcr.provenance_change.json",
337337
"markdown_path": ".",
338338
"risk_findings": 1,

benchmark/reports/pcr.benchmark_summary.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Conclusion: The core detection pipeline is stably covered by automated benchmark
1616
- Raw data: Covers duplicate/highly similar rows and columns, fixed steps, high-frequency values, missing-concentrated-by-group, terminal digit distribution, inter-column relationships, and non-continuous variable anomalies; clean controls maintain 0 risk signals.
1717
- Summary statistics: Covers SE/SD/N, CI, percent/count, p/t/df, p-value domain, and R scrutiny/SPRITE feasibility checks.
1818
- In-text statistics: Covers R statcheck p-value consistency checks on APA/NHST expressions.
19-
- Literature & network: Covers DOI/PMID parsing, Crossref/OpenAlex/NCBI metadata queries, and citation claim extraction.
19+
- Literature & network: Covers DOI/PMID parsing, Crossref/OpenAlex/PubPeer/NCBI metadata queries, and citation claim extraction.
2020
- Images: Covers image discovery, internal duplicates, local copy-move, metadata quality, and Western blot/gel review checklist.
2121
- Code & project: Covers Python/R script reruns, Stata/SPSS/SAS read-only prompts, cross-material data reconciliation, project manifest, provenance version chain, and local corpus screening.
2222

@@ -36,18 +36,18 @@ Not executed (--no-network used).
3636

3737
| Case | Type | Pass | Seconds | Risk Signals | Info | Missing Tools | Missing Checks |
3838
|---|---:|---:|---:|---:|---|---|
39-
| raw_suspicious | single_run | Yes | 1.16 | 16 | 0 | | |
40-
| raw_clean_control | single_run | Yes | 1.046 | 0 | 0 | | |
41-
| summary_suspicious | single_run | Yes | 2.083 | 17 | 2 | | |
42-
| p_values_suspicious | single_run | Yes | 1.012 | 2 | 0 | | |
43-
| apa_stats_suspicious | single_run | Yes | 2.193 | 2 | 0 | | |
44-
| paper_refs_and_claims_offline | single_run | Yes | 1.011 | 0 | 4 | | |
45-
| analysis_suspicious | single_run | Yes | 1.378 | 1 | 1 | | |
46-
| analysis_manual_unsupported | single_run | Yes | 1.012 | 0 | 3 | | |
47-
| figures_project | project | Yes | 1.205 | 11 | 13 | | |
48-
| project_full | project | Yes | 2.555 | 12 | 19 | | |
49-
| corpus_screen | corpus | Yes | 2.355 | 4 | 0 | | |
50-
| provenance_change | provenance_change | Yes | 2.067 | 1 | 5 | | |
39+
| raw_suspicious | single_run | Yes | 1.284 | 16 | 0 | | |
40+
| raw_clean_control | single_run | Yes | 1.147 | 0 | 0 | | |
41+
| summary_suspicious | single_run | Yes | 2.279 | 17 | 2 | | |
42+
| p_values_suspicious | single_run | Yes | 1.039 | 2 | 0 | | |
43+
| apa_stats_suspicious | single_run | Yes | 2.156 | 2 | 0 | | |
44+
| paper_refs_and_claims_offline | single_run | Yes | 1.072 | 0 | 4 | | |
45+
| analysis_suspicious | single_run | Yes | 1.42 | 1 | 1 | | |
46+
| analysis_manual_unsupported | single_run | Yes | 1.054 | 0 | 3 | | |
47+
| figures_project | project | Yes | 1.201 | 11 | 13 | | |
48+
| project_full | project | Yes | 2.471 | 12 | 19 | | |
49+
| corpus_screen | corpus | Yes | 2.104 | 4 | 0 | | |
50+
| provenance_change | provenance_change | Yes | 2.072 | 1 | 5 | | |
5151
| external_refs_online | project_network | Yes | 0.0 | 0 | 0 | | |
5252

5353
## Tool Coverage
@@ -93,5 +93,5 @@ Not executed (--no-network used).
9393
## Interpretation Boundaries
9494

9595
The high/medium/low levels in this report are benchmark risk signals, not conclusions of academic misconduct, fabrication, or fraud. `info` records are run statuses, dependency states, skip reasons, or coverage notes; they do not count toward risk conclusions.
96-
Network test cases depend on real-time availability, certificate chains, and rate limiting of Crossref, OpenAlex, and NCBI. If network cases fail, first check HTTP/SSL/rate-limit information in evidence before concluding it is a detector regression.
96+
Network test cases depend on real-time availability, certificate chains, credentials, and rate limiting of Crossref, OpenAlex, PubPeer, and NCBI. If network cases fail, first check HTTP/SSL/rate-limit information in evidence before concluding it is a detector regression.
9797
All weak-signal tools are only for surfacing human review directions. Final review should return to original data, scripts, image source files, literature metadata, and audit logs.

0 commit comments

Comments
 (0)