feat(meta-analysis): add Phase 3f.5 FINAL_POOL_LOCK.yaml screening freeze (#30)

Yoojin-nam · claude · web-flow · commit 0819efe5e267 · 2026-05-23T15:46:50.000+09:00
Adds Phase 3f.5 ("Pool composition lock") and ships a template at
templates/FINAL_POOL_LOCK.yaml.template.

Once round-3 adjudication freezes, the lock becomes the single
source of truth for include_count, exclude_count, mixed_count, and
the canonical UID list. SHA-256 hash provides tamper-evidence.

Downstream artifacts (extraction TSV, manuscript prose, PRISMA flow
caption, supplementary INDEX, cover letter) reference the lock
instead of re-deriving counts. Companions:
- PR T1-1 sync-submission Phase 5 --pool-lock
- PR T1-6 meta-analysis Phase 4 entry gate

Motivation: cross-project precedent of 5-document INCLUDE/EXCLUDE
drift caused by a late adjudication that propagated unevenly.

Co-authored-by: Claude Opus 4.7 &lt;noreply@anthropic.com&gt;
diff --git a/skills/meta-analysis/SKILL.md b/skills/meta-analysis/SKILL.md
@@ -216,6 +216,51 @@ ID sets. The Markdown consensus document remains the human explanation.
 
 **Precedent incident (a PRISMA-DTA meta-analysis revision):** a late-revision manuscript shipped with k_qualitative = 32 / k_narrative-only = 10 / k_FT-excluded = 46. ID-set reconciliation (performed only after an adversarial audit at post-Stage 4 QC) revealed true counts 24/2/54. An early-draft prose total ("30 → 32 after FLAG consensus") had been carried forward without ever being reconciled against the screening TSV intersected with the consensus spreadsheet; four downstream artifacts echoed the same wrong total. This gate would have caught the drift at the Phase 5 hand-off.
 
+#### 3f.5 Pool composition lock (MANDATORY at adjudication freeze)
+
+After Phase 3f reconciliation passes, freeze the pool composition into a
+single source-of-truth YAML so every downstream artifact (extraction TSV,
+manuscript prose counts, PRISMA flow caption, supplementary INDEX, cover
+letter free-text) can be checked against it.
+
+Why this lock exists
+^^^^^^^^^^^^^^^^^^^^
+
+Cross-project precedent (anonymized): an LLM reporting-quality SR carried
+five documents that disagreed on INCLUDE (63 vs 64) and EXCLUDE
+(108/109/111). Three EXCLUDE rows existed in the extraction sheet without
+matching INCLUDE. The drift traced to a late round-3 adjudication whose
+result was applied to some artifacts and not others — there was no single
+canonical post-freeze count to reference.
+
+How to lock
+^^^^^^^^^^^
+
+1. Copy the template:
+   ```bash
+   cp "${CLAUDE_SKILL_DIR}/templates/FINAL_POOL_LOCK.yaml.template" \
+       2_Data/FINAL_POOL_LOCK.yaml
+   ```
+2. Fill in counts and UID lists from the reconciliation in Phase 3f.
+3. Compute the SHA-256 integrity hash from the sorted UID list.
+4. Commit the lock to git BEFORE starting Phase 4 extraction.
+
+Downstream gates
+^^^^^^^^^^^^^^^^
+
+- `/meta-analysis` Phase 4 entry: extraction TSV's UID set MUST equal
+  `include_uids` ∪ `mixed_uids` from the lock. See Phase 4 entry gate.
+- `/sync-submission` Phase 5
+  (`scripts/cross_document_n_check.py --pool-lock`): every numeric claim
+  in manuscript / abstract / supplementary that maps to a locked
+  category must match the locked value.
+- Manuscript prose: NEVER re-derive `k included` from extraction TSV at
+  manuscript build time. Always reference `final_pool_n` from the lock.
+
+If a late post-freeze decision changes the pool, treat it as a formal
+PROSPERO amendment: file the amendment, re-freeze the lock as a new
+file (`FINAL_POOL_LOCK_v2.yaml`), and propagate to every artifact.
+
 ### Phase 4: Data Extraction
 
 **Goal**: Create standardized extraction forms and extract 2x2 or effect size data.
diff --git a/skills/meta-analysis/templates/FINAL_POOL_LOCK.yaml.template b/skills/meta-analysis/templates/FINAL_POOL_LOCK.yaml.template
@@ -0,0 +1,70 @@
+# FINAL_POOL_LOCK.yaml — frozen pool composition for an SR/MA
+#
+# Created at Phase 3f.5 (round-3 adjudication freeze) by /meta-analysis.
+# All downstream artifacts (extraction TSV, manuscript prose counts,
+# PRISMA flow caption, supplementary INDEX, cover letter free-text)
+# must agree with these values exactly. /sync-submission Phase 5
+# `scripts/cross_document_n_check.py --pool-lock` enforces this.
+#
+# Why a lock file
+# ---------------
+# Cross-project precedent (anonymized): an LLM reporting-quality SR carried
+# five documents that disagreed on INCLUDE (63 vs 64) and EXCLUDE
+# (108/109/111). Three EXCLUDE rows existed in the extraction sheet
+# without matching INCLUDE. The drift traced to a late round-3 adjudication
+# whose result was applied to some artifacts and not others.
+#
+# The lock file is the single source of truth. Once the freeze line is
+# crossed, NEVER re-derive the counts from raw artifacts in a downstream
+# script — always reference the lock.
+
+# ---------------------------------------------------------------------------
+# Metadata
+# ---------------------------------------------------------------------------
+
+# ISO-8601 date when the pool was frozen.
+freeze_date: "YYYY-MM-DD"
+
+# Round at which freeze occurred — typically "round_3_adjudication".
+freeze_stage: "round_3_adjudication"
+
+# Freeform note describing which screening sheet anchored this lock.
+provenance:
+  screening_artifact: "2_Screening/round3_adjudication.tsv"
+  adjudicator: "first_reviewer"
+  ai_assisted_round: false   # set true if AI pre-screening was used per SKILL.md Phase 3c
+
+# ---------------------------------------------------------------------------
+# Counts (canonical numbers — NEVER edit without re-freezing)
+# ---------------------------------------------------------------------------
+
+# Studies in the final pool (Phase 4 extraction candidate set).
+final_pool_n: 0
+
+# Total INCLUDE decisions across rounds (post-adjudication).
+include_count: 0
+
+# Total EXCLUDE decisions (full-text excluded).
+exclude_count: 0
+
+# Mixed (eligible for some outcomes, excluded for others).
+mixed_count: 0
+
+# ---------------------------------------------------------------------------
+# Identifier sets
+# ---------------------------------------------------------------------------
+
+# UID lists. Use stable IDs (PMID, DOI, or screening-sheet record ID).
+include_uids: []
+exclude_uids: []
+mixed_uids: []
+
+# ---------------------------------------------------------------------------
+# Integrity hash
+# ---------------------------------------------------------------------------
+
+# SHA-256 of the sorted include_uids + exclude_uids + mixed_uids list,
+# joined with newlines. Provides tamper-evidence: any single UID edit
+# changes the hash. Recompute with:
+#   python -c 'import hashlib; ids = sorted(open("..._uids.txt").read().splitlines()); print(hashlib.sha256("\n".join(ids).encode()).hexdigest())'
+sha256: ""