Skip to content

Commit 2fb4ce6

Browse files
authored
feat(sync-submission): add Phase 4 cover-letter free-text drift gate (#34)
Adds scripts/cover_letter_drift_check.py and Phase 4 gate to /sync-submission. Measures body word count, abstract word count, reference count, and table/figure count from the manuscript and compares against the numeric claims in the cover letter. Body words allow 5% tolerance ("approximately N words"); abstract allows ±5; ref / table / figure counts require exact match. Cover letters are submission-portal sidecar artifacts that the docx scanners in this skill do not touch. When a manuscript branches v_N → v_(N+1) (word limit retarget, abstract restructure, late reference batch), the cover letter is routinely left at v_N counts. Cross-project observation (anonymized): a CK-line manuscript was compressed to ~3,036 body words and a 319-word abstract during the alignment round, but the cover letter still said "approximately 3,790 words", "250 words", and "12 verified references". All three claims surfaced only via manual grep at the portal-upload stage. The script never edits the cover letter — drift resolution is left to the manuscript build pipeline so the cover letter stays a deliberate authored artifact.
1 parent 80db846 commit 2fb4ce6

2 files changed

Lines changed: 497 additions & 0 deletions

File tree

skills/sync-submission/SKILL.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,56 @@ The registry is a project-local YAML mapping author identifiers (full names, nat
7272
- Gate 5: before freeze, confirm portal free-text fields (cover letter, data availability, acknowledgements, abstract, author contributions) match the manuscript body.
7373
- Gate 6 (double-blind journals): before freeze, export the portal's blinded review PDF and grep for all author identifiers across the entire upload set — manuscript, supplementary, cover letter, registry record PDFs (PROSPERO/ClinicalTrials), portal Letter-field text. A clean manuscript blind does not imply a clean portal blind.
7474
- Gate 7 (text-only docx rebuilds): never use `pandoc --reference-doc=manuscript.docx` for response/cover/supplementary text-only docx — the reference docx ships its embedded media (figure files) into the new docx, bloating size 50–100×. Use plain `pandoc input.md -o output.docx` for text-only artifacts.
75+
- Gate 5b (Phase 4 cover-letter free-text drift): before freeze, run `scripts/cover_letter_drift_check.py` to verify the cover letter's word-count / reference-count / table-figure-count claims still match the manuscript. Cover letters routinely go stale across v_N → v_(N+1) branching and are not covered by any docx-level audit. See "Phase 4 — Cover-letter free-text drift" below.
7576
- Gate 8 (Phase 5 cross-document N consistency): before freeze, run `scripts/cross_document_n_check.py` over the manuscript bundle (abstract, body, PROSPERO record, cover letter, supplementary, INDEX, PRISMA flow caption). Any N category with >1 distinct integer value is a P0 drift. When a `FINAL_POOL_LOCK.yaml` is present, supply `--pool-lock` to make the locked counts the authoritative baseline. See "Phase 5 — Cross-document N consistency" below.
7677
- Gate 9 (Phase 6 intra-manuscript scope drift): run `scripts/scope_drift_check.py` against the manuscript (and optionally the PROSPERO record). Numeric anchors (AUC, OR/HR/RR, sensitivity/specificity) appearing in Limitations / Discussion but absent from Methods + Results are P0 SCOPE_DRIFT. PROSPERO ↔ Methods synthesis-method disagreement is a P0 PROSPERO_DRIFT.
7778
- Gate 10 (Phase 7 v_(N+1) docx regeneration): when building a new submission from a frozen prior version, run `scripts/verify_package_integrity.py --assert-vN-docx-changed --vN-docx <prev>.docx --new-docx <next>.docx`. Identical MD5 = unmodified seed copy = block submission. Defense-in-depth — required even when the upstream pipeline appears to have regenerated the docx.
7879

80+
## Phase 4 — Cover-letter free-text drift
81+
82+
Cover letters live outside the submission docx files but are read by the
83+
editor side-by-side with the manuscript. Their `## Article details`
84+
block — body word count, abstract word count, reference count,
85+
table/figure count — is a sidecar SSOT that routinely goes stale when a
86+
manuscript branches v_N → v_(N+1) (word limit retarget, abstract
87+
restructure, late reference batch).
88+
89+
`scripts/cover_letter_drift_check.py` measures the manuscript truth and
90+
compares it to the cover letter's numeric claims:
91+
92+
```bash
93+
python "${CLAUDE_SKILL_DIR}/scripts/cover_letter_drift_check.py" \
94+
--manuscript manuscript.md \
95+
--cover-letter cover_letter.md \
96+
--refs refs.bib \
97+
--out qc/cover_letter_drift.json
98+
```
99+
100+
Body words are matched with a 5% tolerance ("approximately N words"
101+
phrasing). Abstract words tolerate ±5. Reference / table / figure counts
102+
require exact match.
103+
104+
Output `qc/cover_letter_drift.json`:
105+
106+
```json
107+
{
108+
"submission_safe": false,
109+
"truth": {"body_words": 3036, "abstract_words": 319, "references": 12,
110+
"tables": 3, "figures": 4},
111+
"claims": {"body_words": 3790, "abstract_words": 250, "references": 12},
112+
"drifts": [
113+
{"field": "body_words", "truth": 3036, "cover_letter_claim": 3790,
114+
"severity": "MAJOR",
115+
"note": "|claim - truth| = 754 > tolerance 151"}
116+
]
117+
}
118+
```
119+
120+
Drift resolution: regenerate the cover letter from the manuscript at
121+
v_(N+1) build time. The script never edits the cover letter — that is
122+
left to the manuscript build pipeline so the cover letter stays a
123+
deliberate authored artifact.
124+
79125
## Phase 5 — Cross-document N consistency
80126

81127
Multi-document cohort-size drift is a high-frequency desk-reject pattern.

0 commit comments

Comments
 (0)