Skip to content

feat(self-review,analyze-stats): design/power statistic provenance guard#38

Open
Yoojin-nam wants to merge 1 commit into
mainfrom
feat/design-power-provenance
Open

feat(self-review,analyze-stats): design/power statistic provenance guard#38
Yoojin-nam wants to merge 1 commit into
mainfrom
feat/design-power-provenance

Conversation

@Yoojin-nam
Copy link
Copy Markdown
Contributor

Summary

  • Adds a guard for design/power statistics (minimum detectable effect, a-priori/post-hoc power, required sample size, a-priori planning effect sizes). These are computed, not extracted, so they have no CSV row or source-paper Table for the existing Phase 2.5a source-fidelity audit to trace — a blind spot through which a non-reproducible value can pass multiple review rounds.
  • self-review: new Phase 2.5a-2 — every reported MDE / power / sample-size value must be reproduced by committed code (method + inputs stated); independent recompute; classify as calculation error vs method/provenance drift; cross-manuscript method-consistency check; enforced gate.
  • analyze-stats: rule 9 — design/power statistics must be emitted by the committed script (never hand-computed in a side tool), using one consistent method family (e.g. exact noncentral-t).

Motivation

Generalises a real failure mode: a hand-entered minimum detectable effect that no standard method reproduced survived several review rounds because no committed script computed it; separately, correct sample sizes were produced by an exact-t tool while the committed script used a normal approximation (right value, no reproducible provenance). The precedent in the skill is synthetic.

Test plan

  • bash scripts/validate_skills.sh — ALL CHECKS PASSED (0 failures, 0 meta-doc PII)
  • New content is generic/synthetic — no project names, manuscript IDs, author names, or personal paths
  • Gate added to self-review Gates table; reciprocal rule added to analyze-stats Statistical Reporting Rules

🤖 Generated with Claude Code

…e guard

Design and power statistics (minimum detectable effect, a-priori/post-hoc power,
required sample size) are computed rather than extracted, so they have no CSV or
source-paper row for the existing source-fidelity audit to trace — a common blind
spot through which a non-reproducible value can survive multiple review rounds.

- self-review: new Phase 2.5a-2 requires every reported MDE / power / sample-size
  value to be reproduced by committed code (method + inputs stated), with an
  independent recompute, a cross-manuscript method-consistency check, and an
  enforced gate.
- analyze-stats: rule 9 — design/power statistics must be emitted by the committed
  script (never hand-computed in a side tool) using one consistent method family.

validate_skills.sh passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant