Skip to content

Commit 707d0a6

Browse files
Yoojin-namclaude
andauthored
feat(sync-submission): add cross-document N consistency gate (Phase 5) (#26)
Adds scripts/cross_document_n_check.py and Phase 5 documentation. Detects cohort-size drift (k included / k excluded / N patients) across the submission package — manuscript, abstract, PROSPERO record, cover letter, supplementary, INDEX. Optional --pool-lock anchors to a frozen FINAL_POOL_LOCK.yaml. Motivation: multi-document N drift is a recurring desk-reject pattern in SR-MA submissions. A clean docx audit does not catch numeric disagreement between abstract prose and supplementary extraction sheets. Includes synthetic fixture tests (consistent, drift, pool-lock mismatch, explicit --files). All 4 pass. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent bd300e2 commit 707d0a6

4 files changed

Lines changed: 675 additions & 0 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,4 @@ skills/**/HANDOFF_*.md
5555

5656
# Gemini/Antigravity AI assistant scratchpad (per-session, not for repo)
5757
.gemini/
58+
_pr_drafts/

skills/sync-submission/SKILL.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,64 @@ The registry is a project-local YAML mapping author identifiers (full names, nat
7272
- Gate 5: before freeze, confirm portal free-text fields (cover letter, data availability, acknowledgements, abstract, author contributions) match the manuscript body.
7373
- Gate 6 (double-blind journals): before freeze, export the portal's blinded review PDF and grep for all author identifiers across the entire upload set — manuscript, supplementary, cover letter, registry record PDFs (PROSPERO/ClinicalTrials), portal Letter-field text. A clean manuscript blind does not imply a clean portal blind.
7474
- Gate 7 (text-only docx rebuilds): never use `pandoc --reference-doc=manuscript.docx` for response/cover/supplementary text-only docx — the reference docx ships its embedded media (figure files) into the new docx, bloating size 50–100×. Use plain `pandoc input.md -o output.docx` for text-only artifacts.
75+
- Gate 8 (Phase 5 cross-document N consistency): before freeze, run `scripts/cross_document_n_check.py` over the manuscript bundle (abstract, body, PROSPERO record, cover letter, supplementary, INDEX, PRISMA flow caption). Any N category with >1 distinct integer value is a P0 drift. When a `FINAL_POOL_LOCK.yaml` is present, supply `--pool-lock` to make the locked counts the authoritative baseline. See "Phase 5 — Cross-document N consistency" below.
7576
- Gate 10 (Phase 7 v_(N+1) docx regeneration): when building a new submission from a frozen prior version, run `scripts/verify_package_integrity.py --assert-vN-docx-changed --vN-docx <prev>.docx --new-docx <next>.docx`. Identical MD5 = unmodified seed copy = block submission. Defense-in-depth — required even when the upstream pipeline appears to have regenerated the docx.
7677

78+
## Phase 5 — Cross-document N consistency
79+
80+
Multi-document cohort-size drift is a high-frequency desk-reject pattern.
81+
Manuscript abstracts, body prose, PROSPERO records, supplementary extraction
82+
sheets, and PRISMA flow captions all repeat the same `k included` / `k excluded`
83+
/ `N patients` totals — and any disagreement between them is read by reviewers
84+
as either a data-integrity failure or a late-edit failure. Either reading
85+
ends the round.
86+
87+
`scripts/cross_document_n_check.py` scans the submission package, extracts
88+
every "N <noun>" claim by category (patients, cases, included, excluded,
89+
nodules, tumors, studies_total), and groups them by category. A category with
90+
more than one distinct integer value is a P0 drift.
91+
92+
```bash
93+
python "${CLAUDE_SKILL_DIR}/scripts/cross_document_n_check.py" \
94+
--root . \
95+
--out qc/cross_document_n.json
96+
```
97+
98+
When the project has frozen a `2_Data/FINAL_POOL_LOCK.yaml` from `/meta-analysis`
99+
Phase 3f.5, pass it as the authoritative anchor:
100+
101+
```bash
102+
python "${CLAUDE_SKILL_DIR}/scripts/cross_document_n_check.py" \
103+
--root . \
104+
--pool-lock 2_Data/FINAL_POOL_LOCK.yaml \
105+
--out qc/cross_document_n.json
106+
```
107+
108+
Output `qc/cross_document_n.json`:
109+
110+
```json
111+
{
112+
"submission_safe": false,
113+
"drift_count": 1,
114+
"drifts": [
115+
{
116+
"category": "included",
117+
"values": [63, 64],
118+
"locations": [
119+
{"file": "abstract.md", "line": 4, "value": 63, "context": "..."},
120+
{"file": "supplementary/s1.md", "line": 12, "value": 64, "context": "..."}
121+
],
122+
"severity": "MAJOR"
123+
}
124+
],
125+
"lock_violations": []
126+
}
127+
```
128+
129+
Treat `submission_safe: false` as a halt. Resolve drift by tracing each
130+
location to its data artifact (extraction sheet, PRISMA cascade TSVs) and
131+
correcting the document(s) that disagree with the locked count.
132+
77133
## Phase 7 — v_(N+1) docx regeneration gate
78134

79135
When a v_N submission package was frozen and a v_(N+1) is being built

0 commit comments

Comments
 (0)