You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(sync-submission): add cross-document N consistency gate (Phase 5) (#26)
Adds scripts/cross_document_n_check.py and Phase 5 documentation. Detects
cohort-size drift (k included / k excluded / N patients) across the
submission package — manuscript, abstract, PROSPERO record, cover letter,
supplementary, INDEX. Optional --pool-lock anchors to a frozen
FINAL_POOL_LOCK.yaml.
Motivation: multi-document N drift is a recurring desk-reject pattern in
SR-MA submissions. A clean docx audit does not catch numeric disagreement
between abstract prose and supplementary extraction sheets.
Includes synthetic fixture tests (consistent, drift, pool-lock mismatch,
explicit --files). All 4 pass.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: skills/sync-submission/SKILL.md
+56Lines changed: 56 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -72,8 +72,64 @@ The registry is a project-local YAML mapping author identifiers (full names, nat
72
72
- Gate 5: before freeze, confirm portal free-text fields (cover letter, data availability, acknowledgements, abstract, author contributions) match the manuscript body.
73
73
- Gate 6 (double-blind journals): before freeze, export the portal's blinded review PDF and grep for all author identifiers across the entire upload set — manuscript, supplementary, cover letter, registry record PDFs (PROSPERO/ClinicalTrials), portal Letter-field text. A clean manuscript blind does not imply a clean portal blind.
74
74
- Gate 7 (text-only docx rebuilds): never use `pandoc --reference-doc=manuscript.docx` for response/cover/supplementary text-only docx — the reference docx ships its embedded media (figure files) into the new docx, bloating size 50–100×. Use plain `pandoc input.md -o output.docx` for text-only artifacts.
75
+
- Gate 8 (Phase 5 cross-document N consistency): before freeze, run `scripts/cross_document_n_check.py` over the manuscript bundle (abstract, body, PROSPERO record, cover letter, supplementary, INDEX, PRISMA flow caption). Any N category with >1 distinct integer value is a P0 drift. When a `FINAL_POOL_LOCK.yaml` is present, supply `--pool-lock` to make the locked counts the authoritative baseline. See "Phase 5 — Cross-document N consistency" below.
75
76
- Gate 10 (Phase 7 v_(N+1) docx regeneration): when building a new submission from a frozen prior version, run `scripts/verify_package_integrity.py --assert-vN-docx-changed --vN-docx <prev>.docx --new-docx <next>.docx`. Identical MD5 = unmodified seed copy = block submission. Defense-in-depth — required even when the upstream pipeline appears to have regenerated the docx.
76
77
78
+
## Phase 5 — Cross-document N consistency
79
+
80
+
Multi-document cohort-size drift is a high-frequency desk-reject pattern.
81
+
Manuscript abstracts, body prose, PROSPERO records, supplementary extraction
82
+
sheets, and PRISMA flow captions all repeat the same `k included` / `k excluded`
83
+
/ `N patients` totals — and any disagreement between them is read by reviewers
84
+
as either a data-integrity failure or a late-edit failure. Either reading
85
+
ends the round.
86
+
87
+
`scripts/cross_document_n_check.py` scans the submission package, extracts
88
+
every "N <noun>" claim by category (patients, cases, included, excluded,
89
+
nodules, tumors, studies_total), and groups them by category. A category with
90
+
more than one distinct integer value is a P0 drift.
0 commit comments