feat(self-review): supplement/multi-file hygiene detector + promised-stat gate (detectors 30→31)#187
Merged
Merged
Conversation
…stat gate (detectors 30→31) Promotes the review-harvest "supplement leaks are technical-check-fatal but never linted" cluster. Existing gates lint manuscript.md only; the rendered supplement, a separately-built tables file, and caption files reach reviewers unlinted. New detector — skills/self-review/scripts/check_supplement_hygiene.py: lints a LIST of rendered reader-facing artifacts (--supplement, repeatable) for SUPP_INTERNAL_LABEL § / §L internal section or SAP labels SUPP_PLACEHOLDER Table SX / S-N / [Authors] / figure-glob / build-dir path SUPP_BUILD_MARKER [VERIFY] / TODO / FIXME / "Remove this line if" SUPP_RESPONSE_FRAMING "Per Section Editor #2" / "Response to Reviewers" SUPP_PLANNING_RESIDUE "Designed by:" / "Expected … Numbers" / "to be executed" and, with --manuscript, SUPP_XREF_UNRESOLVED (a body "Supplementary Table/Figure N" callout with no matching supplement section — renumber drift / silently-skipped section). All Major. Extend — check_artifact_coverage.py: new PROMISED_STAT_NO_VALUE verdict + a --supplement corpus arg. Fires (conservatively) when a bound/ceiling/de-confounded statistic (AUC, c-statistic, sensitivity…) is promised with a reporting verb but never given a numeric value anywhere in the manuscript or supplement. Cross-artifact (#25): the main↔supplement↔figure-source drift class is already covered by sync-submission's check_cross_artifact_stale.py; documented in Phase 2.5f to re-run it after any audit/reframe (the gap was wiring, not a missing tool). Catalog: register check_supplement_hygiene under reporting_compliance, regenerate metadata/detectors_catalog.json (30→31, reporting 7→8), bump metadata/catalog_counts.json + MEDSCI_AUDIT.md; validate_catalog_consistency passes. Tests: new test_supplement_hygiene.sh (dirty→all 6 verdicts, clean→0, xref resolve, usage error); test_artifact_coverage.sh extended with promised-stat positive/negative (value-present suppresses). Both wired into validate.yml. No skill count change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…detector Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
05e15b7 to
08cecc1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Promotes the review-harvest cluster "supplement leaks are technical-check-fatal but the gates only lint
manuscript.md." Adds one detector, extends another, and documents the cross-artifact re-run.New detector —
check_supplement_hygiene.py(#9 / #12 / #6 / #8)Lints a list of rendered reader-facing artifacts (
--supplement, repeatable: supplement, separately-built tables, captions). All verdicts Major:SUPP_INTERNAL_LABEL§/§Linternal section / SAP labelsSUPP_PLACEHOLDERTable SX,S-N,[Authors], figure-path globfigS_*.{png,pdf}, build-dir path1_Search/SUPP_BUILD_MARKER[VERIFY…],TODO,FIXME,Remove this line if…SUPP_RESPONSE_FRAMINGPer Section Editor #2,Response to Reviewers,Reviewer 2 Comment 4SUPP_PLANNING_RESIDUEDesigned by:,Expected … Numbers,to be executedSUPP_XREF_UNRESOLVED--manuscript) a bodySupplementary Table/Figure Ncallout with no matching supplement section — renumber drift / silently-skipped sectionExtend —
check_artifact_coverage.py(#11)New
PROMISED_STAT_NO_VALUE+ a--supplementcorpus arg. Conservatively fires when a bound/ceiling/de-confounded statistic (AUC, c-statistic, sensitivity…) is promised with a reporting verb but never given a numeric value anywhere in the manuscript or supplement — the "described but never quantified" reviewer catch. A reported value suppresses it (tested).Cross-artifact drift (#25)
Already covered by
sync-submission/check_cross_artifact_stale.py(main↔supplement↔figure-source labeled-value drift). The gap was wiring — documented in Phase 2.5f to re-run it after any audit/reframe, not only once.Catalog (30 → 31)
check_supplement_hygieneregistered underreporting_compliance;detectors_catalog.jsonregenerated (reporting 7→8),catalog_counts.json+MEDSCI_AUDIT.mdbumped.validate_catalog_consistency.pygreen. No skill-count change → skills/marketplace catalogs untouched. Additive → no release.Tests / CI
test_supplement_hygiene.sh: dirty → all 6 verdicts; clean → 0 (no FP); xref resolves Table 2, flags Figure 9; usage error on no--supplement.test_artifact_coverage.shextended: promised-stat positive fires, value-present negative suppresses.validate.yml.validate_skills.sh,gen_detectors_catalog_json.py --check,validate_catalog_consistency.py,check_locale_inventory.py, skills/marketplace/detectors catalog tests all pass.🤖 Generated with Claude Code