docs(CHANGELOG): record reverse-engineering batch (reporting guidelines 32->36 + probes/exemplars/tables/appraisal)

Yoojin-nam · claude · Yoojin-nam · commit ae2f998453e2 · 2026-06-15T13:44:46.000+09:00
Co-Authored-By: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,15 @@
 
 ## [Unreleased]
 
+### Added
+
+- **Reverse-engineering batch — adjacent clinical-research scaffolds (reporting guidelines 32 → 36).** A scored, gap-register-driven loop (`reverse_engineer/`) added guideline-grounded scaffolds for clinical-AI areas the project did not yet cover, each authored under the license firewall (`distill.py`): own-words **educational summaries** of the guideline item *intent* (no verbatim wording from copyrighted/NC sources), with `_corpus/` raw sources gitignored. Four new vendored reporting checklists in `skills/check-reporting/references/checklists/` — **TRIPOD-LLM** (studies using large language models; numbered to the official 19-item scheme), **CONSORT-AI** + **SPIRIT-AI** (AI clinical-trial reports + protocols; close a pre-existing `MISSING_CHECKLIST_CONTRACT_VIOLATION` where both were routed/aliased but unvendored), and **DECIDE-AI** (early-stage live clinical evaluation of AI decision-support). Each wired end-to-end (alias map, fail-fast test, `LICENSES.md` row, Step 1 auto-detect row, `skill.yml` card) and CI-gated by `validate_catalog_consistency.py` (32 → 36).
+- **METRICS radiomics appraisal reference (`skills/check-reporting/references/appraisal_tools/METRICS.md`)** — a methodological-quality / risk-of-bias tool (EuSoMII; Kocak et al. *Insights Imaging* 2024, CC BY 4.0), 9 categories / 30 weighted condition-dependent items. Deliberately placed under `appraisal_tools/` (NOT counted `references/checklists/`) so it does **not** inflate the reporting-guideline count — the repo keeps appraisal tools distinct from reporting checklists (`critical_item_floor.md`). Wired as the load-on-demand reference behind the Step 4f appraisal cross-check; reporting-guideline count stays 36.
+- **New domain-probe module + AI-extension subsections** (`skills/peer-review/references/domain-probes/`, vendored byte-identical into `/self-review`). New module **`diagnostic_accuracy.md` (D1–D6)** for DTA primary studies + multi-reader multi-case (MRMC) reader studies (verification/spectrum/blinding bias, indeterminate handling, fully-crossed/washout design, reader+case variance). Plus AI reporting-flow subsections on three existing modules: **TRIPOD+AI (T1–T4)** on `survival_prognostic.md`, **CONSORT-AI/SPIRIT-AI (A1–A5)** on `rct_trial.md`, and **decision-impact (DI1–DI5, DECIDE-AI axis)** on `ai_overclaiming.md`. `MODULES` tuple 7 → 8; routed from both peer-review (new Phase 2I) and self-review.
+- **Figure-anatomy exemplars (`skills/make-figures/references/exemplar_plots/`)** — four new synthetic, citation-free anatomy models: **`decision_curve.md`** (net-benefit / DCA), **`mrmc_roc.md`** (MRMC reader-study ROC with per-reader + reader-averaged curves and reader+case CIs), **`bland_altman.md`** (agreement: bias + ±1.96·SD limits with CIs, proportional-bias check, not-a-correlation discipline), and **`confusion_matrix.md`** (raw + row/column-normalized, class-imbalance caveat).
+- **Table-type standards (`skills/analyze-stats/references/table-standards/table-types/`)** — **`incremental_value.md`** (added value beyond a baseline: paired ΔAUC + DeLong CI, NRI event/non-event split, IDI, net benefit) and **`reader_study.md`** (MRMC per-reader + reader-averaged performance with Obuchowski–Rockette/DBM reader+case CIs, per-patient vs per-lesion unit).
+- **Structured-abstract exemplar (`skills/write-paper/references/exemplar_abstract.md`)** — completes the write-paper exemplar set (intro/methods/results/discussion already shipped); mandates a primary estimate with 95% CI + denominator and a failure-modes section (no estimate-free "significant", no body↔abstract number mismatch). Wired into Phase 6.
+
 ### Changed
 
 - **Recommendation-calibration gate (Phase 2F) extended to review articles + fixable/unfixable tier-domination** (`skills/peer-review/SKILL.md`). Phase 2F (previously "AI/Method Papers" only) now also fires for **Review / narrative / primer** articles and adds three rules: (1) **fixable vs unfixable tier-domination** — when repairable defects (extraction errors, missing supplementary, mislabeled table) coexist with unrepairable ones (poolability of incommensurable studies, broken construct, invalid evaluation instrument), the unfixable class governs the recommendation; (2) **review/narrative escalation** — for a review article the distinct contribution (novelty/synthesis/domain-specificity) *is* the product, so weak-novelty/no-distinct-contribution is **unfixable-in-current-form** ("add a contribution" = a different paper) and escalates one tier toward Reject rather than defaulting to the revision/Reconsider tier; (3) **confidential-note Reject-grade self-grep** — deferring the value/priority judgment to the editorial board is itself a Reject-grade tell, not a neutral hand-off. QC items 11/12 and the final checklist updated. Sourced from a review-article (LLM-hallucination primer) decision self-audit in which the reviewer recommended a Reconsider tier and the editor rejected — the 6th lenient-calibration recurrence; diagnosis remains calibration discipline, not a new type-probe.