Skip to content

Releases: Aperivue/medsci-skills

MedSci Skills v5.0.0

28 Jun 13:54
f332681

Choose a tag to compare

Changed

  • v5.0.0 — storefront repositioning for the medical-AI model-engineering lane. A material
    distribution change, not a label bump: the model-engineering lane (built additively across
    v4.x Phases 1–4 plus the Phase 5 breadth below) now has its own storefront home and the repo's
    identity is widened to cover it.
    • New model_engineering storefront category ("Model Engineering & Validation") and
      medsci-modeling marketplace plugin, carved out of "Data & Study Design" (medsci-data).
      The 6 lane skills — architecture-zoo, model-scaffold, model-validation, model-card,
      model-evaluation, mllm-eval — now group under their own catalog filter and installable
      plugin (/plugin now lists nine category plugins). Both catalog generators
      (gen_skills_catalog_json.py category mapping/order, gen_marketplace_json.py plugin
      name/description) and their self-tests cover the new category.
    • README + ROADMAP repositioned to the end-to-end identity: MedSci Skills is an end-to-end
      research tool for physician and medical-engineering researchers to design → scaffold →
      validate → publish — for the clinical manuscript and the medical-AI model alike. "Clinical AI
      model research engineering is in scope" is now explicit, while "not a general AI-scientist
      platform" (and not a diagnostic tool or autonomous author) is kept; the lane integrates
      MONAI / nnU-Net and never reimplements them or runs anything autonomously.
    • Counts unchanged (51 skills / 41 detectors / 38 reporting guidelines); CI stays torch-free.

Added

  • Medical-AI model-engineering lane — Phase 5 (build-lane breadth). Expands the existing
    /model-scaffold and /architecture-zoo skills; no new skills/detectors/probes (counts
    unchanged: 51 skills / 41 detectors / 38 guidelines), torch-free CI.
    • /model-scaffold now generates 5 task types (was segmentation-only): --task
      segmentation (U-Net), classification (small multi-label CNN; swap in a timm
      backbone), detection (torchvision Faster R-CNN + FPN), synthesis (Pix2Pix U-Net
      generator + PatchGAN), ssl (SimCLR encoder + projection head, NT-Xent). Every task keeps
      the reproducibility guarantees by construction — the patient-level seed-locked split is
      task-independent, and each emitted train.py / evaluate.py passes check_training_hygiene
      (all RNGs seeded, cuDNN deterministic, train-only loader, eval() + no_grad()). The
      challenge + regression test now verify all 5 tasks (split + hygiene + valid Python, network-free).
    • /architecture-zoo adds the detection.md and synthesis.md family cards (R-CNN family /
      Faster R-CNN+FPN / Mask R-CNN / RetinaNet / YOLO / DETR; Pix2Pix / CycleGAN / SPADE / diffusion
      / VAE / fastMRI), each with the source paper, when-to-use, medical use, reference implementation,
      validation setup, and matching scaffold template; the decision-tree index now routes to them.

MedSci Skills v4.11.0

28 Jun 12:58
fba1394

Choose a tag to compare

Added

  • find-journal: acceptance-feasibility axis. A Phase 2.5 pre-flight
    (assess_acceptance_readiness.py, deterministic + reproducible challenge card)
    scans a manuscript for design-ceiling / unfixable-defect / importance-risk /
    claim-mismatch signals and a ceiling verdict (advisory risk band, never a
    probability). Adds two-axis ranking (scope fit × acceptance feasibility) with
    explicit mismatch surfacing, an Acceptance Signals profile schema
    (references/acceptance_signals_schema.md, populated for European Radiology, AJR,
    KJR, RYAI, Investigative Radiology), a reject-fallback cascade plan, and a
    desk-reject vs post-review distinction in Post-Rejection Mode. Helper named
    assess_* (not a detector-catalog member); counts unchanged (additive). (#215)
  • Medical-AI model-engineering lane — Phase 1 (validation MVP). First slice of the v5.0
    "design → scaffold → validate → publish medical-AI model research" lane, led by the
    validation/reporting half (the build/scaffold half lands in a later phase). Clinician-anchored,
    torch-free, additive.
    • New skill /model-validation (Layer D, advisory + deterministic audit) — design or audit
      the clinical-validation study for an engineer-built medical-imaging model (segmentation /
      classification / detection): patient-level split disjointness + the data-leakage taxonomy,
      tuning-on-test, internal vs genuine external validation, comparator design, single-run vs
      multi-seed variance, task-correct metric selection (Metrics Reloaded), test-set sizing handoff
      to /calc-sample-size, and CLAIM 2024 / TRIPOD+AI / STARD-AI reporting fit. Integrates with
      MONAI / nnU-Net — does not replace them. Skills 45 → 46.
    • New reviewer domain-probe model_development.md (MD0–MD8) (/peer-review + /self-review,
      vendored byte-identical) — partition/leakage mechanics, tuning/threshold/model-selection on the
      test set, the internal-vs-external-validation conflation, seed/run variance, test-set event
      count, metric selection, reproducibility/provenance, and reference-standard/label quality.
      Domain-probe modules 15 → 16. Grounded in the leakage taxonomy (Kapoor & Narayanan, Patterns
      2023), Varoquaux & Cheplygina (npj Digit Med 2022), CLAIM 2024, and Metrics Reloaded
      (Maier-Hein & Reinke et al., Nat Methods 2024).
    • New deterministic detector check_split_leakage.py (/model-validation) — proves (by set
      arithmetic on the emitted split_assignment.csv, not heuristics) that no patient crosses
      train/val/test, and that the split records a reproducible seed. Verdicts PATIENT_OVERLAP
      (Major), MISSING_SEED (Major), SINGLE_PARTITION (Minor); train/validation/holdout synonyms
      collapse so a labelling variant never trips it. Stdlib-only, network-free, with a reproducible
      challenge card + CI-wired regression test. Integrity detectors 36 → 37.
  • Medical-AI model-engineering lane — Phase 2 (build/scaffold). Completes the
    build → validate chain in-repo, staged after Phase 1's verification contract. Clinician-anchored
    (a reproducible research scaffold generator that integrates MONAI / nnU-Net, not a replacement);
    default CI stays torch-free.
    • New skill /model-scaffold (Layer B) — scaffold.py stamps out a runnable PyTorch
      segmentation training repo (configurable U-Net, dataset.py, losses.py, train.py,
      evaluate.py, config.yaml, requirements.txt, REPRODUCIBILITY.md, methods_stub.md) with
      the reproducibility guarantees baked in by construction: a patient-level seed-locked split
      written as an auditable artifact (splits/split_assignment.csv + split_seed.txt, disjoint by
      construction so it clears /model-validation's check_split_leakage), all-RNG seeding + cuDNN
      determinism, a train-only loader, and eval() + no_grad() inference. No fabricated numbers
      ([VERIFY] placeholders). Skills 46 → 47.
    • New deterministic detector check_training_hygiene.py (/model-scaffold) — conservative
      AST linter (flag-not-prove, the training-code analogue of check_generated_code): all RNGs
      seeded, cuDNN deterministic, eval() + no_grad() inference, no training on a non-train split.
      Verdicts SEED_INCOMPLETE / MISSING_EVAL_MODE / TRAIN_ON_NONTRAIN_SPLIT (Major),
      CUDNN_NONDETERMINISTIC / EVAL_SHUFFLE (Minor). Integrity detectors 37 → 38.
    • scaffold_challenge executes the build → validate chain network-free: scaffold a repo →
      deterministic split matches the frozen expected + is patient-disjoint (proven inline) → passes
      check_training_hygiene → a self-skipping torch tier (forward shape + gradients + reproducible
      loss when torch is installed; SKIP, never CI coverage of runnability, when absent).
    • New skill /architecture-zoo (Layer D, advisory) — the choose front end of the lane: maps a
      research question (task + modality / dimensionality + labelled-data scale + class imbalance) to a
      paper-grounded architecture shortlist via a decision tree, then per-architecture cards with core
      idea, when-to-use, medical-imaging use, reference implementation, the typical validation/experiment
      setup, and the matching /model-scaffold template. Seeds the classification (ResNet / DenseNet /
      EfficientNet / Inception / ViT / Swin / DeiT), segmentation (U-Net / 3-D U-Net / V-Net / Attention
      & Residual U-Net / nnU-Net / SegResNet / Swin-UNETR / Mask R-CNN), and foundation/SSL (SAM / MedSAM /
      MedSAM2 / TotalSegmentator / SegVol / BiomedCLIP / DINO / MAE / SimCLR / MoCo) families. Every
      recommendation names its source paper; it teaches archetypes, not a live SOTA leaderboard. Skills
      47 → 48.
  • Medical-AI model-engineering lane — Phase 3 (reporting). The documentation seam of the lane,
    after validation (Phase 1) and build (Phase 2). Clinician-anchored, additive.
    • New skill /model-card (Layer C) — generate the documentation an engineer-built model must
      carry: a Model Card (Mitchell et al., FAccT 2019), a dataset Datasheet (Gebru et al.,
      CACM 2021), and a METRIC-informed data-quality pass (Schwabe et al., npj Digit Med 2024),
      filled from user-supplied facts — never fabricated (intended use, out-of-scope use, training data,
      per-subgroup performance, caveats, provenance, consent, licence). Templates live in references/
      and are uncounted (documentation standards, not clinical reporting checklists — same treatment
      as appraisal_tools/METRICS.md), so reporting_guidelines is unchanged. Skills 48 → 49.
    • New deterministic detector check_model_card_complete.py (/model-card) — verifies every
      required Model Card / Datasheet section is present and non-empty (not missing, not an unfilled
      [NEEDS INPUT] placeholder). Verdicts MISSING_SECTION / EMPTY_REQUIRED_SECTION (Major); a
      presence check, not a truth check. reporting_compliance family. Integrity detectors 38 → 39.
    • Reproducible challenge (check_model_card_complete_challenge, synthetic complete + incomplete
      fixtures) + CI-wired regression test (8 cases).
  • Medical-AI model-engineering lane — Phase 4 (evaluation + MLLM). The evaluation half, completing
    the choose → build → validate → evaluate → report chain. Clinician-anchored, additive.
    • New skill /model-evaluation (Layer B) — compute task-correct held-out metrics for a trained
      imaging model (segmentation: Dice + a boundary metric HD95/NSD per structure; classification: AUROC
      • AUPRC + sensitivity/specificity with bootstrap CIs at the deployment prevalence; detection: FROC/
        mAP with a stated IoU criterion) + calibration + subgroup slices, emitting a per-case table for
        /analyze-stats. check_metric_reporting.py gates the metric choice against Metrics Reloaded
        (Maier-Hein & Reinke et al., Nat Methods 2024) / CLAIM 2024 (PIXEL_ACCURACY_SEG /
        NO_BOUNDARY_METRIC / ACCURACY_ONLY / DETECTION_METRIC_MISSING / CI_MISSING). data_preparation
        family. Skills 49 → 50.
    • New skill /mllm-eval (Layer B) — a model-agnostic (closed API or open weights) evaluation
      harness for an LLM/MLLM on a clinical task (report generation, VQA, extraction): adjudicated
      reference standard, clinical-efficacy metrics (RadGraph-F1 / CheXbert-F1 beyond BLEU/ROUGE),
      faithfulness/hallucination, pretraining-contamination, prompt sensitivity, reader study.
      check_mllm_eval_completeness.py gates the plan (NGRAM_ONLY / FAITHFULNESS_MISSING /
      REFERENCE_STANDARD_MISSING / CONTAMINATION_UNADDRESSED / READER_STUDY_MISSING / …).
      reporting_compliance family. Skills 50 → 51.
    • New reviewer domain-probe mllm_evaluation.md (ME0–ME8) (/peer-review + /self-review,
      vendored byte-identical) — the reviewer-side audit of an LLM/MLLM clinical evaluation. Grounded in
      RadCliQ (Yu et al., Patterns 2023), RadGraph (Jain et al., NeurIPS 2021), CheXbert (Smit et al.
      2020), MedVH / Med-HALT, MI-CLEAR-LLM. Domain-probe modules 16 → 17. Integrity detectors 39 → 41.
    • Uncounted appraisal ref appraisal_tools/METRICS_RELOADED.md (metric-selection guidance; not a
      counted reporting checklist). Reproducible challenges + CI-wired regression tests for both detectors.

MedSci Skills v4.10.0

28 Jun 09:24
32294e3

Choose a tag to compare

Added

  • Three new reviewer domain-probe modules (/peer-review + /self-review, vendored
    byte-identical), reverse-engineered from high-IF CC-BY papers under the reverse_engineer/
    license firewall: mendelian_randomization.md (MR1–MR8: the three IV assumptions, a
    pleiotropy-robust sensitivity suite rather than IVW alone, Steiger/direction, sample overlap,
    non-linear-MR caution, drug-target colocalization); polygenic_risk_score.md (PG1–PG8:
    ancestry transferability/portability, base/target leakage, incremental value over the clinical
    model, screening detection-rate-vs-discrimination, target-population calibration);
    network_meta_analysis.md (NM1–NM8: transitivity, global+local incoherence, SUCRA/P-score
    over-interpretation, CINeMA/GRADE-NMA certainty, component-NMA additivity). Domain-probe modules
    12 → 15.
  • Observational probe O17 (observational_confounding.md) — agnostic many-exposure-scan
    multiplicity (ExWAS / EWAS / MWAS): correction matched to claim against the honest test-count
    denominator, independent replication as the real safeguard, correlated-exposure conservatism,
    selective top-hit reporting.
  • Two reporting-guideline checklists (/check-reporting): STROBE-MR (Mendelian
    randomization) and PGS-RS / PRS-RS (polygenic-score risk prediction), with study-type
    routing + aliases. Reporting guidelines 36 → 38.
  • Four /analyze-stats analysis guides: multiple-testing/high-dimensional screening,
    Mendelian randomization, polygenic risk score, and network meta-analysis.
  • /clean-data implausible-value & cross-field validity rules reference — organ-system
    compatible-with-life bounds + cross-field logical-consistency rules (temporal ordering,
    derived-vs-source, sex-/state-specific), flag-not-auto-fix.

Changed

  • Clinician-friendly update reminders. The classroom installers
    (install-macos.command / install-windows.cmd / install-windows.ps1) now enable the in-app
    "update available" notice and the one-click Desktop updater by default (turnkey path; disable
    with --disable-update-notify or MEDSCI_NO_UPDATE_CHECK=1). For the npx/manual paths the
    installer prints a one-time nudge showing how to turn reminders on (--enable-update-notify),
    and the README Quick Start recommends it. New read-only update.session_hook_enabled() gates the
    nudge; the npx/manual paths stay opt-in (no silent SessionStart hook).

MedSci Skills v4.9.0

25 Jun 22:19
4538d0c

Choose a tag to compare

Added

  • Duplicate-bibliography gate — new check_reference_duplication.py
    (/manage-refs, also usable from /sync-submission) reads the BUILT artifact
    (.docx via stdlib zipfile, or a rendered .md/.txt) and fires
    DUP_REF_HEADING / REF_NUMBER_RESTART / REF_SIGNATURE_DUP (Major) when the
    reference list is duplicated. Catches the hybrid failure where a manuscript
    carries both inline [@key] citations and a hand-typed ## References list and
    is built with pandoc --citeproc: the build renders the hand list and a
    citeproc bibliography (often after the legends), so the same reference appears
    twice; check_xref does not see it. Author-anchored (first-author, year)
    signature detection works on Word auto-numbered lists. Validated against a real
    built docx with the duplicate (caught) and its single-list fix (clean).
    Stdlib-only; PII-free fixtures + test_reference_duplication.sh.

  • Cross-script binning-consistency gate — new check_binning_consistency.py
    (/self-review, Phase 2.5b) parses analysis source (R/Python) and emits
    BINNING_DRIFT (Major) when one derived categorical (age band, BMI category,
    eGFR stage, risk tier) is binned with ≥2 different (breaks, right-closure)
    signatures across files. The same cohort then splits differently per script:
    per-stratum Ns drift between a primary table and a sensitivity table while the
    grand total still reconciles, so a row-sum check passes but a stratum can
    spuriously cross a threshold. Motivated by a screening cohort that binned age
    right=FALSE in the primary script vs right=TRUE in a threshold sensitivity
    script — fractional ages shifted hundreds of participants and produced a
    spurious "reached" stratum. Stdlib-only; PII-free fixtures +
    test_binning_consistency.sh.

    Together these two gates take the analysis-integrity detector suite 34 → 36
    (citation family 6 → 7, data-preparation 5 → 6); skills and reporting guidelines
    unchanged. Additive and backward-compatible.

  • Float citation-order gate — new check_citation_order.py (/self-review)
    flags numbered floats not cited in ascending order of first appearance, per series
    independently (main Tables, main Figures, Supplementary Tables, Supplementary
    Figures). It scans only the narrative body (auto-excluding the Figure Legends /
    back-matter so an in-order legends block cannot mask an out-of-order body) and
    tolerates plural lists ("Tables S4, S5"), ranges, and non-float sensitivity-spec
    labels ("S1–S6"). CITATION_ORDER (Major) is a pre-peer-review desk/technical-check
    item editorial offices "unsubmit" for; CITATION_GAP (Minor) flags non-contiguous
    numbering. Motivated by a journal technical-check unsubmit where main Table 3 was
    cited before Tables 1–2 and the supplementary tables were cited wildly out of order
    (S4, S9, S16, S12, …). Wired into /self-review's technical-check pass; synthetic
    positive/negative fixtures + regression test. Analysis-integrity detectors
    33 → 34 (Reporting compliance family 8 → 9); skills 45 and reporting guidelines
    36 unchanged. Additive and backward-compatible.

  • Percentage-decimal style check + KJR technical-check conventions/self-review's
    check_classical_style.py gains a PERCENT_DECIMALS verdict (Minor, report-only)
    flagging percentages reported to >1 decimal place ("35.14%"), which several journals
    (e.g. KJR) require at one decimal at technical check; regression fixture + test added.
    The KJR journal profile (write-paper detail + find-journal compact) gains a
    Technical-Check Conventions section enumerating the deterministic pre-review desk
    items that "unsubmit" a manuscript: ascending float citation order, demographics in
    Materials and Methods, one-decimal percentages, double spacing, Acknowledgments/Funding/
    Author-Contributions on the Title Page only, reporting checklist cited as "Supplementary
    Material 1", IRB number in Methods even when blinded, and ICMJE forms only after
    acceptance. No detector-count change (existing detector extended; profiles updated, not
    added). Motivated by a 2026-06 KJR technical-check unsubmit.

  • Audit-dump leak gate — new check_checklist_dump_leak.py (/sync-submission)
    scans every .md/.docx/.pdf in a submission directory for the residue of a
    /check-reporting or /self-review internal audit report (compliance_pct,
    fixable_by_ai, check_reporting_version, Auto-fix:, [PARTIAL→auto-fixed],
    suggested_fix, Action Items, _pipeline_log, NON-AUTHORITATIVE). Any hit is
    a P0 leak: these tooling tokens must never reach a reviewer. Motivated by a
    near-miss where a prior project's STROBE_checklist_v4.pdf was actually the
    check-reporting dump, reused by filename and compiled into the reviewer-visible
    proof (exposing auto-fix notes, raw JSON, and a stale old title). Wired into
    preflight_gate.py as a P0 check over the journal asset directory; writes
    qc/checklist_dump_leak.json. /check-reporting reports now also open with a
    NOT-FOR-SUBMISSION banner so the working audit is self-identifying.
    Analysis-integrity detectors 32 → 33; skills 45 and reporting guidelines 36
    unchanged. Additive and backward-compatible.

  • Frontmatter schema gate (Agent Skills cross-platform portability) — new
    scripts/check_frontmatter_schema.py + CI step strictly yaml.safe_loads every
    skills/*/SKILL.md frontmatter and enforces the published Agent Skills spec: valid
    YAML, name ≤64 chars / lowercase-hyphen / no reserved claude/anthropic token,
    description present / ≤1024 chars / no XML angle brackets. The repo's own generators
    use a tolerant line-based reader, so a frontmatter block that is not valid YAML could
    pass every prior gate yet be rejected by a strict-YAML consumer (the agentskills.io
    directory validator or another agent platform). Self-test (tests/test_frontmatter_schema.sh)
    covers each violation class. This is a repo-CI validator, not a counted detector.

Changed

  • Skill-boundary documentation — a diagnostic pass confirmed the 45 skills are
    deliberately specialized (no consolidation warranted), but several boundaries were
    easy to confuse. README's "Skills Work Together" now carries a Skill boundaries
    block spelling out the reference pipeline (search-litlit-syncmanage-refs
    verify-refs), the language pass order (humanizepolish-languageacademic-aio),
    manuscript-type selection (write-paper / review-paper / revise), author-vs-reviewer
    (self-review / peer-review), project entry (intake-project / orchestrate), study
    design (design-study perceptual ceiling gate / design-ai-benchmarking), and content
    vs template (write-protocol / fill-protocol). /revise now documents the manual
    fallback when /analyze-stats or /make-figures is unavailable (emit a checklist, hold
    responses as BLOCKED — pending analysis/figure, never invent numbers). Docs only.

  • /analyze-stats observational-design precondition — Phase 2 (Analysis Plan) now opens
    with a WARN-level precondition: before planning an observational analysis (cohort,
    case-control, cross-sectional, registry, survey), confirm a literature-grounded
    variable_operationalization.md (from /define-variables) or equivalent codebook-backed
    definition table exists; if not, warn and recommend /define-variables first so
    exposure/outcome/covariate definitions and cutoffs are citation-backed rather than invented
    ad hoc from the data dictionary. WARN, not a hard block (proceed on explicit confirmation;
    stricter projects can treat it as a hard stop). Mirrors the precondition /write-protocol
    already enforces before drafting Methods, closing the one observational-pipeline skill that
    lacked it. Guidance only — non-breaking, no new code gate.

  • /meta-analysis progressive disclosure (token hygiene) — the two inline "Empirical
    Lessons" sections (16 dated SR-MA peer-review lessons, ~45 lines) moved verbatim to
    load-on-demand references/empirical_lessons.md, with an explicit "load before Phase 4
    extraction-form design and before Phase 8 submission" pointer and a Reference Files
    entry — matching the skill's own established pattern (15 existing reference files). The
    largest SKILL.md in the bundle drops 804 → 775 lines (less context loaded on every
    activation); the lessons stay discoverable via the reference list. Content byte-preserved
    (no rewrite, no renumber — a pre-existing duplicate "9." label is carried over and noted in
    the reference file). No skill/detector count change.

  • De-drift the sync-submission YAML front-matter splittercheck_wordcount_cap.py
    and cover_letter_drift_check.py each carried their own _strip_yaml_front_matter, marked
    "keep in sync" but already drifted (list vs tuple return; subtly different unclosed-fence
    handling). Extracted one canonical split_yaml_front_matter() into a private
    scripts/_yaml_frontmatter.py (leading underscore → not counted as a detector) imported by
    both — the helper ships in the same skill's scripts/ dir, so it stays self-contained when
    vendored/installed. Behavior-preserving (verified normal / no-front-matter / unclosed cases

    • the wired test_wordcount_cap and test_preflight_gate subprocess-import path). No
      skill/detector count change.

Fixed

  • Public-doc count reconciliationREADME.md (MedSci-Audit suite line) and
    CITATION.cff (abstract) cited stale catalog totals from before the detectors above
    merged (28 detectors / 32 EQUATOR guidelines). Reconciled to the disk SSOT
    (metadata/catalog_counts.json): 36 analysis-integrity detectors / 36 reporting
    guidelines
    . Added a What's New "Unreleased" block to README.md so the public
    progression no longer implies v4.8 is current. No code or count change — the SSOT was
    already correct; o...
Read more

MedSci Skills v4.8.0

24 Jun 12:01
2e7ad1a

Choose a tag to compare

The review-harvest batch: deterministic detector hardening promoted from real-manuscript review
cycles — four false-positive fixes, two new gates, nine reviewer-side domain probes, and a
design-stage gate. Additive and backward-compatible — no skill, CLI, or output-path change;
skills 45 and reporting guidelines 36 unchanged; analysis-integrity detectors 30 → 32.

Added

  • Reader-facing supplement / multi-file hygiene gate — new check_supplement_hygiene.py
    (/self-review) lints the rendered supplement, a separately-built tables file, and caption files
    (not just manuscript.md) for the technical-check-fatal residue that hides there: §/§L internal
    labels, unfilled placeholders (Table SX, [Authors], figure-path globs, build-dir paths), build
    markers ([VERIFY]/TODO), response-to-reviewers framing, planning residue, and body↔supplement
    cross-reference numbers that don't resolve. check_artifact_coverage.py gains
    PROMISED_STAT_NO_VALUE + a --supplement corpus (a bound/ceiling/de-confounded statistic promised
    but never given a number anywhere). (#187)
  • Power-aware null-interpretation gate — new check_null_calibration.py (/self-review)
    flags a headline negative/equivalence claim ("no synergy", "not associated") that carries no
    minimum-detectable-effect, power, equivalence-margin/TOST, or CI-compatibility statement. Plus a
    reusable rating_monotonicity.py template (/analyze-stats) that catches a folded
    confidence-weighted (call × confidence) → AUC encoding, and a /design-study design-stage ceiling
    gate for perceptual/reader-AI studies (6 ceiling-breakers set before data lock). (#188)
  • Nine reviewer-side domain probes across the shared peer-review/self-review modules: SR/MA
    small-k enrollment-overlap, mixed-denominator pooling, prospective-registration chronology, and
    boundary-degenerate proportions (P14–P17); observational selection-on-availability and
    serial-imaging lesion-tracking (O15/O16); diagnostic exclusion-flow ↔ prose + modality-safety (D8);
    AI arm-task-vs-deployment-workflow (AO6); and a survival apparent-vs-optimism deterministic tell
    (S7). (#186)
  • Integrity detector count: 30 → 32.

Fixed

  • Four detector false positives that fired Major on legitimate (often recommended) patterns:
    check_generated_code no longer flags a hex-color palette (the colorblind-safe WONG palette
    make-figures recommends) as hand-typed tabular data; check_classical_style fires the § AI-tell
    only on a section cross-reference, not on author-footnote daggers; check_scope_coherence clears
    CROSS_SECTIONAL_PROGNOSTIC when the prognostic token sits inside a negation/deferral frame; and
    check_cohort_arithmetic no longer mis-binds the RATE_BACKCALC numerator to a tier label's digit
    or a decimal's fraction. Each ships a regression fixture; three previously-unwired test suites are
    now CI-wired. (#185)

Changed

  • Release pipeline now also publishes to npm (idempotent, with npm provenance via OIDC), so the
    npx medsci-skills@latest install channel no longer drifts behind the GitHub release. The step runs
    only when the NPM_TOKEN repo secret is set, skips if that version is already on npm (re-running a
    tag is safe), and runs after the GitHub Release so an npm hiccup never blocks it. No product change.

MedSci Skills v4.7.0

22 Jun 09:17
7ea0b91

Choose a tag to compare

The self-update foundation: physician-researchers stay current without GitHub, git, or a
terminal — via a transactional crash-safe installer, a verified one-click updater, a hardened
release pipeline, and an opt-in update notice. Additive and backward-compatible — no skill, CLI,
or output-path change; skills 45 and reporting guidelines 36 unchanged. All four pieces are
network-mocked-tested and run on Ubuntu + macOS + Windows CI.

Added

  • Transactional, crash-recoverable installer + per-target state. install.py now installs each
    target through a durable journal state machine (installers/medsci_txn.py,
    prepared → old_moved → new_installed → committed, atomic-write + fsync): an interrupted install
    is recovered on the next run (roll back an incomplete transaction, forward-clean a committed one,
    fail closed on a corrupt journal). It keeps a per-target installed manifest at
    ~/.medsci-skills/targets/<target>/ with a per-skill SHA-256 inventory — a skill you modified
    is snapshotted to ~/.medsci-skills/backups/<ts>/ before an update, legacy collisions are backed up
    there (never inside the skills dirs, never auto-deleted), and only MedSci-owned skills are pruned
    (your/third-party skills are untouched). Adds canonical-home containment path-safety, a
    disk-space preflight, two deterministic tracked manifests
    (metadata/distribution_manifest.json ownership/version + metadata/distribution_files.json
    payload inventory) with a CI --check gate, and a Windows/macOS CI matrix. (#177)
  • One-click self-updater (installers/update.py). Fetches the latest classroom release and
    re-installs through the transactional installer — no GitHub UI, git, or terminal. Resolves the
    release via api.github.com only and fails closed if the API has no sha256 digest; verifies
    the download's sha256 == the API digest, the asset name, and the tag; and never extractall()s
    it extracts per entry, rejecting path traversal (POSIX + Windows), symlink/hardlink/junction,
    case-insensitive duplicates, and zip-bombs, and enforcing the distribution_files.json allowlist +
    per-file hash. Installs the updater to ~/.medsci-skills/updater/ (survives deleting the download
    folder); install.py --check-update reports availability via semver with a clock-sane 24h cache;
    optional consented --desktop-launcher. Thin .command/.cmd launchers wrap it; a privacy notice
    (docs/update_privacy.md) states the honest scope. (#178)
  • Release-pipeline supply-chain hardening. release.yml now gates on a version-consistency check
    (the pushed tag must equal CITATION.cff == package.json == metadata/distribution_manifest.json
    and the tracked inventory must match the tree); injects a verified provenance.json
    {schema_version, tag, version, git_sha, built_at} into each classroom ZIP via
    build_classroom_release.py --tag/--git-sha/--built-at; attests the ZIPs' build provenance
    (actions/attest-build-provenance); runs on a protected release environment (required-reviewer
    approval); and — via the new scripts/check_release_zip.py — verifies each ZIP round-trips through
    the updater's own safe-extract + provenance validation before publishing, so a release can never
    ship a ZIP the self-updater would reject (locked by installers/tests/test_release_zip.sh).
    provenance.json stays a control file (excluded from the safe-extract inventory). SECURITY.md
    gains a "Release integrity & revocation" section; docs/maintainer_workflow.md documents the
    protected-environment setup. (#179)
  • Opt-in update notice for Claude Code (off by default). install.py --enable-update-notify
    merges a SessionStart hook (installers/session_update_check.py) into ~/.claude/settings.json
    that prints a one-line "update available" systemMessage at session start; --disable-update-notify
    removes only that hook (keying on the home-anchored script path, so it never touches a foreign hook).
    The hook does not read the SessionStart stdin (no cwd/transcript/session id), has no
    telemetry/analytics/unique-id, uses the shared clock-sane 24h cache + a 4 s timeout, stays silent on
    any error (never blocks a session), honors MEDSCI_NO_UPDATE_CHECK=1, and installs nothing — it
    only notifies. A version check resolves the latest tag without the OS-specific download asset
    (resolve_latest_tag), so the notice works on Linux too; the settings merge is idempotent, preserves
    foreign hooks/settings, and refuses to clobber an unparseable settings.json. Tested offline
    (installers/tests/test_session_hook.py, 38 cases). (#180)

Trust boundary (honest scope)

  • Running a release's bundled installer is remote code execution within the GitHub trust boundary.
    The digest and the build-provenance attestation detect transport / asset tampering — they do
    not defend against a compromised publisher account or a malicious official release. See
    SECURITY.md and docs/update_privacy.md.

MedSci Skills v4.6.0

21 Jun 08:53

Choose a tag to compare

A maintainability, governance, and review-depth release. Integrity detectors 28 → 30; domain probes 11 → 12; skills 45 and reporting guidelines 36 unchanged. No skill rename, CLI, or output-path change — additive and backward-compatible.

Added

  • Fairness / equity / subgroup-performance domain probe (equity_fairness.md, EQ0–EQ6). Vendored byte-identical into /peer-review and /self-review (MODULES 11 → 12). Fires only when a manuscript claims cross-population performance or presents subgroup analyses as a fairness argument: disaggregated subgroup metrics (not aggregate-only), error-rate-vs-discrimination parity and base-rate dependence, a named fairness estimand + between-group gap test, development-cohort representativeness, subgroup EPV/power, and equity-aware framing aligned to TRIPOD+AI / DECIDE-AI / CONSORT-AI. (#170)
  • AI-disclosure + data/code-availability detector (sync-submission/check_disclosure_availability.py). An AI-use disclosure must carry four tokens — version + access channel + date/date-range + responsible party (the tool name only triggers the check) — plus Data/Code Availability presence with a repository/DOI where the journal expects one, keyed by journal_availability_policy.json. (#171)
  • Structured-summary-box conformance detector (academic-aio/check_summary_box.py). Key Points bullet count + one-claim-per-bullet, Research-in-context's three sub-blocks, and plain-language word band, journal-keyed via summary_box_specs.json — catches the wrong-format box a production technical check rejects. (#171)
  • Skill maturity taxonomy (official / experimental / community). A required, additive skill.yml v2.2 field (schema_version stays 2), enforced by validate_skill_contracts.py and surfaced in skills_catalog.json; all 45 current skills are official. (#174)
  • Governance & answer-engine docs: ROADMAP.md (priorities + explicit out-of-scope), MAINTAINERS.md (clinical authority stays with the founder), SECURITY.md (vulnerability reporting + medical-scope boundary), docs/maintainer_workflow.md (review + release checklist), docs/faq.md (AEO/GEO), and two new issue templates (installation problem, detector request). (#173)

Changed

  • Positioning leads with the compliance moat. README hero subline and the marketplace source description (MARKETPLACE_DESCRIPTION) now lead with reporting-guideline + risk-of-bias compliance, reference verification, and deterministic integrity gates rather than skill count. README gains a "What is MedSci Skills?" answer block, a "Start here: 3 workflows" section, and a "Validation status" section (available vs CI-gated vs E1-evaluated). A stale "32 EQUATOR" hero count was corrected to "36 reporting guidelines and risk-of-bias tools". (#173, #174)
  • write-paper Phase 7 token diet (pilot). The three integrity-audit sub-steps (7.3a/7.3b/7.3c) moved to references/phase7_integrity_audits.md behind a control-flow-preserving pointer; measured −10,238 chars (~2,559 tokens) per invocation, loaded on demand only when Phase 7 runs. (#172)

Documentation

  • CONTRIBUTING.md and the PR template add a medical-claim → founder-review gate and an official/experimental/community classification line; IMPACT.md adds an "Interpretation of metrics" caveat block ("early community interest, not widespread adoption"). (#173)

Validation / Evidence

  • New deterministic scripts each ship a network-free challenge/regression test wired into CI. MEDSCI_AUDIT.md detector-count claims corrected (it had drifted to 27/28) and a DETECTOR_CLAIM_FILES gate added to validate_catalog_consistency.py (anchored current-total patterns, never historical evaluation numbers) so the total cannot silently drift again. A regression test for the routing-asset gate (tests/test_routing_assets.sh) covers the references/ pointer that guards the Phase-7 extraction. (#169, #171)

MedSci Skills v4.5.0

20 Jun 12:10

Choose a tag to compare

Added

  • Self-review domain-probe batch (SR/MA + DTA + prediction-model) + submission asset-anon abs-path gate. Five new review probes promoted from field cycles, plus one deterministic submission check. sr_ma.md: P12 risk-of-bias table row-sum ↔ figure-matrix reconciliation (each NOS ★/JBI Y row must equal its printed total; the traffic-light figure's data matrix must match the supplementary table; SSOT = the primary appraisal form, not a plotting-script constant) and P13 included-study ↔ reference-list completeness (every characteristics-table study must be a numbered reference; source citations from PubMed efetch, not hand-kept notes; disambiguate same author/year by technique + sample size). diagnostic_accuracy.md: D7 index-test-as-enrollment-criterion circularity (escalate past Major when an inclusion threshold is the index test under study). clinical_prediction_model.md: CP5 intended-use horizon leakage (claim-timepoint adjectives vs each predictor's availability timepoint) and CP6 validation-nomenclature conflation (development/CV vs held-out/external test). Probes are vendored byte-identical to peer-review. sync-submission/scripts/check_asset_anonymization.py: new scan class 4 — a word/*.xml attribute (e.g. a pandoc-embedded image's <pic:cNvPr descr="…">) carrying an absolute home-dir path (/Users/…, /home/…) is a username leak invisible to a rendered-text scan; flagged as docx_embedded_abs_path (leak severity), with a regression test fixture. No version bump — probe/reference + detector additions.

  • /clean-data + /analyze-stats — reverse-coded-item / negative-alpha detector (integrity detectors 27 → 28). A multi-item Likert scale with a negatively-worded item must recode it (min+max) - x before the scale total or Cronbach's alpha is computed; left un-recoded, the item correlates negatively with the rest of the scale and alpha collapses (often negative). A negative alpha is a coding bug, not a "multidimensional construct" — defending it as such loses a review round. New stdlib-only skills/clean-data/scripts/check_reverse_coding.py computes per-item corrected item-total (item-rest) correlations + the raw Cronbach's alpha and returns REVERSE_CODING_LIKELY (alpha < 0) / REVERSE_CODING_SUSPECT (negative item-rest, alpha ≥ 0) / OK, exit 1 under --strict. skills/analyze-stats/references/templates/likert_summary.py is hardened to print item-rest correlations, flag negative ones as reverse-code suspects, warn loudly on a negative alpha, and apply the recode via a new --reverse-items flag before scoring/alpha. Ships a synthetic fixture (a 3-item scale with one reverse item → raw α = −1.71, plus a clean aligned scale) + CI-wired regression test (skills/clean-data/tests/test_reverse_coding.sh). Detector mapped to the data_preparation family; metadata/detectors_catalog.json regenerated; catalog_counts.json::integrity_detectors 27 → 28. Motivation: a medical-education pilot whose Trust scale shipped at α = −0.57 (one reverse item un-recoded) and consumed a major-revision round before 6 - x restored α = 0.58.

  • Test backfill (cont.) — fill-protocol + fulltext-retrieval regression tests (Tier 1 complete). skills/fill-protocol/tests/test_fill_form.sh builds a synthetic Word template at runtime (python-docx: 2-column key/value table + numbered section headers + title paragraph), runs fill_form.py with a content YAML exercising table_kv/section_replace/paragraph_replace, and asserts the values landed in the reopened docx, the title placeholder was replaced, and an absent label is reported [MISS] — no committed binary fixture. skills/fulltext-retrieval/tests/test_pdf_to_md.py stubs pymupdf4llm before import (the module exits on a missing dep) and pins the dependency-free helpers parse_page_range (ranges/lists/whitespace) and clean_markdown (collapse 4+ newlines, rstrip lines, single trailing newline, idempotent) — no heavy PyMuPDF dependency added to CI. Both use deps already present (python-docx/pyyaml; stdlib). No skill/version change — test infrastructure only.

  • Test backfill (cont.) — fill-icmje-coi + academic-aio regression tests. Three more deterministic, network-free tests wired into CI. skills/fill-icmje-coi/tests/test_fill_icmje_coi.sh clones the shipped synthetic seed for two authors and asserts the documented contract per output docx (14 checked boxes, 13 "None" disclosures, new title/date substituted, author name present, zero placeholder leakage; stdlib zipfile path). skills/academic-aio/tests/test_validate_schema.sh checks the JSON-LD validator (valid ScholarlyArticle passes; wrong @context, unknown @type, missing required field, malformed DOI each fail). skills/academic-aio/tests/test_batch_metadata_audit.sh checks the repo/HF-card auditor (clean repo passes --fail-on-issue; missing README/CITATION/LICENSE fails; report-only mode stays exit 0; a PHI-shaped string in an HF card is flagged). All fixtures synthetic. No skill/version change — test infrastructure only.

  • Test backfill — Tier 0 CI-wiring + deidentify PHI-scan regression test. Ten skill regression tests that existed on disk but were never gated are now wired into .github/workflows/validate.yml, so a silent break fails CI: make-figures (legend reconcile), clean-data (structural-zero), lit-sync (poll logic), meta-analysis (pool consistency), generate-codebook, present-paper (speaker-notes markdown), version-dataset (manifest/verify), manage-refs (vN-docx cross-ref), and polish-language (consistency-linter challenge). New skills/deidentify/tests/test_deidentify_scan.sh asserts the exact PHI-classification contract (PHI/REVIEW_NEEDED/SAFE counts + rrn phi_type) on the three committed fixtures — the CSV scan path is stdlib-only and network-free, and the test file is Hangul-free (column-specific asserts read the fixture header at runtime). CI now installs pandas/numpy/python-pptx/python-docx up front (was: pandas installed after the gates, which would silently skip the dep-guarded tests); version-dataset gains a pandas skip-guard for local robustness. No skill/version change — test infrastructure only.

MedSci Skills v4.4.0

19 Jun 15:57

Choose a tag to compare

Added

  • /peer-review + /self-review — Image-Synthesis / Cross-Modality Generation probe module (IS1–IS4) + reviewer-side reference-integrity spot-check. New domain-probe module image_synthesis.md (vendored byte-identical into /self-review; MODULES 10 → 11, sync gate updated) for studies that synthesize one imaging modality from another (MRI→PET / MRI→CT / non-contrast→contrast / low-dose→full-dose) and claim the output carries functional/molecular information or substitutes for the unavailable target. IS1 determinism/information-ceiling (the synthetic image is a deterministic function of the source, so a same-reader "source + synthetic > source alone" gain is a presentation/interpretability effect absent a direct source→label baseline); IS2 target-derived-preprocessing / undescribed slice-selection leakage (a lesion mask drawn on the target modality guiding slice selection or training makes "function inferred from structure" circular — undescribed provenance is itself a Major #1 candidate); IS3 global-vs-lesion-level quantitative agreement (whole-organ SUVR agreement does not establish lesion-level fidelity); IS4 mechanistic/proxy-signal plausibility (name what the source physically measures vs the target's biology — high image similarity is not evidence an unmeasured signal was recovered). Routed from a new peer-review Phase 2K + Phase 3 QC item 15 + Phase 5 routing line, and a /self-review routing-table row. Per Phase 2F, IS2/IS4 are typically unfixable-in-current-form and govern the recommendation toward Reject-leaning. Companion reviewer-side reference-integrity spot-check added to the Phase 2 issue checklist + Phase 3 QC item 16 (all original-research reviews): spot-check the load-bearing Introduction/Discussion citations used as evidence the method/premise works — a paper cited for a different task, a duplicate reference, a wrong year/author — phrasing unconfirmed suspicions "please verify" (the reviewer-side mirror of the authoring citation-safety discipline). Motivation: a decision-audit of a cross-modality MRI→synthetic-PET reader-study review where the three structurally distinct synthesis failure modes were split across reviewers and the reference-list errors went uncaught on the reviewer side.
  • /author-strategy — trajectory-archetype classification (optional, explainable multi-label heuristic). Adds an opt-in capability that classifies a queried author's PubMed trajectory into abstract career archetypes (A1 infrastructure builder, A2 methodology rule-maker, A3 clinical→AI hybrid, A4 SR/MA volume engine, A5 large-consortium participation pattern, A6 clinical-subspecialty device/technique depth, plus a computed A3+A6 composite). The rubric is a single canonical data file (references/trajectory_archetypes.yaml); the narrative references/trajectory_archetypes.md is generated from it by render_archetype_doc.py (--check gate). Each label carries a 0–1 score (computable-signal-weight denominator; unavailable signals — h-index/citation/venue-tier — are excluded and surfaced as [VERIFY], never fabricated), a confidence band capped per archetype, and evidence drawn from the author's own PMIDs (evidence_pmids for per-paper signals, evidence_summary for corpus-level); a negative rule suppresses a label to insufficient evidence. A disambiguation gate precedes classification: fetch_pubmed.py writes a corpus_manifest.json cryptographically bound to the CSV (csv_sha256 + pmid_set_hash) and classify_archetypes.py refuses to run unless review_status: approved and the hashes match — a surname alone never resolves an author, and --approve is a human gate. Target-author attribution (ORCID/affiliation/initials/position) is split into a stdlib-only pubmed_parse.py and never borrows a co-author's metadata on a same-surname collision; author position is reported as a first/middle/last/unknown positional heuristic (not leadership metadata), and analyze_patterns.py's "Leadership rate" is renamed "First/last positional rate". The output header states the labels are explainable heuristics, not objective classifications. Ships name-free synthetic fixtures + a CI-gated regression test (A14). Skill count unchanged — an enhancement, not a new skill.
  • /verify-refs — OpenAlex tertiary index (conference-proceedings / non-DOI recovery). PubMed covers only biomedical literature and CrossRef's proceedings coverage is uneven, so NeurIPS / ICLR / ACL-style citations — common in medical-AI manuscripts — fall through both and were marked UNVERIFIED. After the PubMed and CrossRef tiers, verify_refs.py now consults OpenAlex (https://api.openalex.org, free, no API key) only when no authoritative author list was obtained yet (a reference already resolved by PubMed/CrossRef incurs no extra call). It resolves by DOI when present, otherwise by a token-similarity-guarded title search so a fabricated title cannot earn a spurious OK. This is the free analogue of the second index (e.g. Scopus) that journal portals run alongside CrossRef. Because OpenAlex display names carry no structured family/given field and mix First Last with Last, First forms, OpenAlex-sourced authors support an existence check plus a tolerant first-author membership check but never drive the strict positional or author-count MISMATCH (reserved for PubMed efetch / CrossRef); an OpenAlex miss is UNVERIFIED, never FABRICATED. New --no-openalex flag restricts verification to PubMed + CrossRef. Ships a network-free regression test (tests/test_openalex_tier.sh, monkeypatched http_json, CI gate A8b). Motivation: a medical-AI reference list where two NeurIPS citations validated on Scopus but not CrossRef in a journal portal's reference check.

MedSci Skills v4.3.0

15 Jun 20:36

Choose a tag to compare

Added

  • Observational / cohort probe + gate hardening (sourced from two cross-sectional health-screening cohort self-review→revise loops). Expands observational_confounding.md O1–O6 → O1–O9 (vendored byte-identical into /self-review): O7 — over-adjustment (conditioning on a mediator or consequence of the outcome — the opposite-direction failure to O1, e.g. a renally-excreted lab in an eGFR model; "adjust for everything that differs in Table 1" is not a confounder-selection rule), O8 — analysis unit & clustering (records vs unique subjects → anti-conservative CIs), O9 — outcome construct validity for report-/registry-derived outcomes (composite homogeneity, ascertainment/κ, dictionary-first label provenance, misclassification direction). O1 also gains an exposure-defining-covariate exemption for guideline-defined exposures and a reference-arm-contamination-vs-selection-bias note (O3); check_confounding_completeness.py now computes SMD from per-stratum mean ± SD when the wide Table 1 carries no p / SMD column (interop with /analyze-stats).
  • New domain-probe module clinical_prediction_model.md (CP1–CP4) for cross-sectional / observational prediction models (TRIPOD / TRIPOD+AI nested predictor-set comparisons): apparent-vs-optimism-corrected calibration/DCA, the incremental-value-vs-marginal-effect two-null distinction, EPV per nested model, and net benefit as a model comparison (not a policy endorsement). Vendored byte-identical into /self-review; MODULES 9 → 10; routed from peer-review (new Phase 2E-2) and self-review. Plus two /self-review exemplar_findings/ (over_adjustment_collider.md, prediction_two_null_conflation.md).
  • Cohort-analysis probes (G39–G41). survival_prognostic.md gains S9 — panel-data / multistate variance (occupancy/intensity CIs must be person-clustered or person-bootstrapped, not naive model-based on within-person-correlated visit transitions; S1–S8 → S1–S9). observational_confounding.md gains O10 — overlapping-subset gradient (an effect-size gradient across nested/overlapping cohorts is attributable by construction; inferential "attenuated/accounted-for" language needs a difference/interaction test; O1–O9 → O1–O10). Both vendored byte-identical into /self-review. Plus an extended-adjustment missingness-frame discipline (compare adjusted vs unadjusted on the same reduced complete-case frame, not the full-frame anchor) in /self-review Phase 2.5e + /analyze-stats over-adjustment guidance.
  • Cross-sectional survey-epidemiology probes (G45–G46, paper-driven from CC-BY NHANES cohorts). observational_confounding.md gains O11 — complex-survey design & weighting (NHANES/KNHANES/CHNS: design-based estimation with the correct/scaled weight + stratification + PSU, subpopulation-domain-not-row-deletion, weighted total is a population estimate not a sample n, design-effect/effective-n) and O12 — data-driven threshold / non-linearity mining (a recursive-search 'inflection point' / 'saturation effect' needs a breakpoint CI + pre-specified non-linearity test + stability check, not a quoted cutoff). O1–O10 → O1–O12, vendored byte-identical into /self-review. /analyze-stats survey_weighted.md gains a subpopulation-domain (never row-delete) + survey-reporting-errors block.
  • Cross-sectional mediation probe (G47, paper-driven from CC-BY mediation papers). observational_confounding.md gains O13 — cross-sectional mediation (temporal order & sequential ignorability): a Baron–Kenny / Sobel / PROCESS / bootstrapped indirect-effect chain estimated on single-timepoint data cannot establish the X→M→Y sequence (the bootstrap CI addresses sampling variability, not identification); needs an unmeasured-mediator–outcome-confounding sensitivity analysis (e.g. an E-value for the indirect effect) + a temporal-order caveat, and proportion-mediated is unstable when the total effect is small. O1–O12 → O1–O13, vendored byte-identical into /self-review; adds exemplar_findings/cross_sectional_mediation.md.
  • Cleanup batch (G48/G42/G43). /analyze-stats gains a mediation analysis guide (analysis_guides/mediation.md + SKILL entry): bootstrapped a×b indirect effect, proportion-mediated only with uncertainty, AGReMA reporting, and the discipline that identification (no unmeasured mediator–outcome confounding → E-value for the indirect effect) — not the bootstrap — is the issue (pairs O13). /sync-submission gains scripts/assemble_supplement.py (NOT an integrity detector): validates an S{N}_*.md + index supplement (index↔file 1:1, duplicate/skipped sub-section numbers), rebuilds _combined.md in index order, and reports main-text callout coverage. /render-pdf-doc gains scripts/scan_glyph_coverage.py + a Step 3.5 pre-render scan for the xelatex silent-glyph-drop failure (arrows / − ≤ ≥ ± √ / Greek / ★ ✓ / CJK; optional fonttools cmap check). Both ship fixtures + CI-wired tests (A12/A13). Integrity-detector count unchanged (27).
  • Interaction-scale probe (G49, paper-driven from CC-BY joint-effect papers). observational_confounding.md gains O14 — interaction scale (additive vs multiplicative): a synergy / joint-effect / effect-modification claim is an additive-scale statement and needs RERI / AP / synergy index with CIs, not a multiplicative-only OR product term, joint-category ORs, or stratified-only estimates (the difference-in-significance fallacy). O1–O13 → O1–O14, vendored byte-identical into /self-review; /analyze-stats gains an Interaction & Effect-Modification entry (RERI/AP/S, Knol & VanderWeele). The cross-sectional-cohort review lane (O1–O14 + CP1–CP4 + S9 + gates) is now comprehensive.
  • check_cohort_arithmetic.py — new ANALYSIS_UNIT_UNDISCLOSED check (--id-col, auto-detect with a cardinality guard): when records > unique subjects and the manuscript discloses neither the analysis unit nor a one-record-per-subject sensitivity, emits a Major with a records / unique_subjects / repeat_subjects / max_visits reconciliation (probe O8).
  • check_scope_coherence.py — new CROSS_SECTIONAL_YIELD_LANGUAGE lexicon (Minor): a cross-sectional / prevalence design using incidence-flavored vocabulary ("yield", "detection rate", "number-needed-to-screen/image", "rescreen interval") without defining "yield" once as cross-sectional report-positive prevalence.
  • New detector check_paren_spans.py (/self-review, integrity detectors 26 → 27, family Style & review-process) — a post em-dash→paren-conversion safety scan (cohort-cycle follow-up): a bulk — X —(X) edit can pair two unrelated dashes across a sentence boundary and wrap a whole sentence — or an ordinal limitation ("Sixth, …") — inside one parenthesis, paren-balanced so a balance check misses it. Flags PAREN_SPAN_ORDINAL and PAREN_SPAN_SENTENCE (long spans only, so short legitimate parentheticals like "(Dr. Smith)", "(Fig. 2)", "(95% CI …)" are clean). Wired into /self-review --fix post-edit and /humanize pattern 13. Fixtures + regression test (CI-gated).
  • New detector check_wordcount_cap.py (/sync-submission, integrity detectors 25 → 26, family Reporting compliance) — the revision-inflation trap: a revise loop monotonically adds words and silently breaches the target journal's body cap. Counts the body (Introduction → Discussion, skipping abstract/refs/tables/declarations), compares to a cap from --limit or a parsed --journal-profile article-type line, and emits WORDCOUNT_OVER_CAP (Major) / WORDCOUNT_NEAR_CAP (Minor, >0.95×). The binding number is the rendered count (citeproc expands [@key]), so it prefers --rendered-words N and otherwise estimates from the markdown body + inline-citation expansion. Wired as /sync-submission Gate 13, a /revise exit gate (re-run after every pass), and a /self-review §F check. Ships fixtures + regression test.

Fixed

  • verify_refs.py — corporate/collective-author render-abort fix (cohort-cycle follow-up). A guideline body double-braced in BibTeX ({{EASL} and {EASD}}, {{KDIGO CKD Work Group}}) or returned by PubMed as <CollectiveName> tripped the first-author cross-check as MISMATCH, which aborted render_pandoc.sh on every guideline-citing cohort manuscript. Corporate authors are now detected (surviving brace / <CollectiveName> / organization keyword) and exempted from the personal-name family cross-check (annotated corporate/collective author, never MISMATCH). Personal-author entries are unaffected.
  • check_classical_style.py — em-dash counter counts prose only (cohort-cycle follow-up). It excludes structural dashes — markdown table cells (incl. "—" N/A placeholders and (A) — panel-label captions), ORCID separators, and author/affiliation lines — and reports prose-vs-structural separately, so a cohort manuscript with large baseline tables is not pushed into destructive edits on correct table dashes.
  • check_confounding_completeness.py — DB-column-code ↔ prose alias map. A DB-exported Table 1 carrying column codes (he_sbp, b_uric, b_chol_hdl) was false-flagged as imbalanced-and-unadjusted when the adjustment set was written in prose ("systolic blood pressure"). An alias map now resolves both to a shared concept; it only ever adds matches (no new false ✓). Genuinely unadjusted covariates still flag.
  • check_confounding_completeness.py — exposure-defining-covariate exemption (O1 false-positive on guideline-defined exposures). For a guideline-defined exposure (MASLD / metabolic syndrome / CKM / sarcopenia / frailty), the components of its own diagnostic criteria (BMI, glycaemia, lipids, BP) are imbalanced by construction and correctly unadjusted — the gate flagged each as a Major. New --exposure-defining-list/-file marks these EXPOSURE_DEFINING_EXEMPT (adjusting for them is over-adjustment, probe O7), so the Major remains only for genuine non-d...
Read more