Releases: Aperivue/medsci-skills
MedSci Skills v5.0.0
Changed
- v5.0.0 — storefront repositioning for the medical-AI model-engineering lane. A material
distribution change, not a label bump: the model-engineering lane (built additively across
v4.x Phases 1–4 plus the Phase 5 breadth below) now has its own storefront home and the repo's
identity is widened to cover it.- New
model_engineeringstorefront category ("Model Engineering & Validation") and
medsci-modelingmarketplace plugin, carved out of "Data & Study Design" (medsci-data).
The 6 lane skills —architecture-zoo,model-scaffold,model-validation,model-card,
model-evaluation,mllm-eval— now group under their own catalog filter and installable
plugin (/pluginnow lists nine category plugins). Both catalog generators
(gen_skills_catalog_json.pycategory mapping/order,gen_marketplace_json.pyplugin
name/description) and their self-tests cover the new category. - README + ROADMAP repositioned to the end-to-end identity: MedSci Skills is an end-to-end
research tool for physician and medical-engineering researchers to design → scaffold →
validate → publish — for the clinical manuscript and the medical-AI model alike. "Clinical AI
model research engineering is in scope" is now explicit, while "not a general AI-scientist
platform" (and not a diagnostic tool or autonomous author) is kept; the lane integrates
MONAI / nnU-Net and never reimplements them or runs anything autonomously. - Counts unchanged (51 skills / 41 detectors / 38 reporting guidelines); CI stays torch-free.
- New
Added
- Medical-AI model-engineering lane — Phase 5 (build-lane breadth). Expands the existing
/model-scaffoldand/architecture-zooskills; no new skills/detectors/probes (counts
unchanged: 51 skills / 41 detectors / 38 guidelines), torch-free CI./model-scaffoldnow generates 5 task types (was segmentation-only):--task
segmentation (U-Net), classification (small multi-label CNN; swap in atimm
backbone), detection (torchvision Faster R-CNN + FPN), synthesis (Pix2Pix U-Net
generator + PatchGAN), ssl (SimCLR encoder + projection head, NT-Xent). Every task keeps
the reproducibility guarantees by construction — the patient-level seed-locked split is
task-independent, and each emittedtrain.py/evaluate.pypassescheck_training_hygiene
(all RNGs seeded, cuDNN deterministic, train-only loader,eval()+no_grad()). The
challenge + regression test now verify all 5 tasks (split + hygiene + valid Python, network-free)./architecture-zooadds thedetection.mdandsynthesis.mdfamily cards (R-CNN family /
Faster R-CNN+FPN / Mask R-CNN / RetinaNet / YOLO / DETR; Pix2Pix / CycleGAN / SPADE / diffusion
/ VAE / fastMRI), each with the source paper, when-to-use, medical use, reference implementation,
validation setup, and matching scaffold template; the decision-tree index now routes to them.
MedSci Skills v4.11.0
Added
- find-journal: acceptance-feasibility axis. A Phase 2.5 pre-flight
(assess_acceptance_readiness.py, deterministic + reproducible challenge card)
scans a manuscript for design-ceiling / unfixable-defect / importance-risk /
claim-mismatch signals and a ceiling verdict (advisory risk band, never a
probability). Adds two-axis ranking (scope fit × acceptance feasibility) with
explicit mismatch surfacing, anAcceptance Signalsprofile schema
(references/acceptance_signals_schema.md, populated for European Radiology, AJR,
KJR, RYAI, Investigative Radiology), a reject-fallback cascade plan, and a
desk-reject vs post-review distinction in Post-Rejection Mode. Helper named
assess_*(not a detector-catalog member); counts unchanged (additive). (#215) - Medical-AI model-engineering lane — Phase 1 (validation MVP). First slice of the v5.0
"design → scaffold → validate → publish medical-AI model research" lane, led by the
validation/reporting half (the build/scaffold half lands in a later phase). Clinician-anchored,
torch-free, additive.- New skill
/model-validation(Layer D, advisory + deterministic audit) — design or audit
the clinical-validation study for an engineer-built medical-imaging model (segmentation /
classification / detection): patient-level split disjointness + the data-leakage taxonomy,
tuning-on-test, internal vs genuine external validation, comparator design, single-run vs
multi-seed variance, task-correct metric selection (Metrics Reloaded), test-set sizing handoff
to/calc-sample-size, and CLAIM 2024 / TRIPOD+AI / STARD-AI reporting fit. Integrates with
MONAI / nnU-Net — does not replace them. Skills 45 → 46. - New reviewer domain-probe
model_development.md(MD0–MD8) (/peer-review+/self-review,
vendored byte-identical) — partition/leakage mechanics, tuning/threshold/model-selection on the
test set, the internal-vs-external-validation conflation, seed/run variance, test-set event
count, metric selection, reproducibility/provenance, and reference-standard/label quality.
Domain-probe modules 15 → 16. Grounded in the leakage taxonomy (Kapoor & Narayanan, Patterns
2023), Varoquaux & Cheplygina (npj Digit Med 2022), CLAIM 2024, and Metrics Reloaded
(Maier-Hein & Reinke et al., Nat Methods 2024). - New deterministic detector
check_split_leakage.py(/model-validation) — proves (by set
arithmetic on the emittedsplit_assignment.csv, not heuristics) that no patient crosses
train/val/test, and that the split records a reproducible seed. VerdictsPATIENT_OVERLAP
(Major),MISSING_SEED(Major),SINGLE_PARTITION(Minor); train/validation/holdout synonyms
collapse so a labelling variant never trips it. Stdlib-only, network-free, with a reproducible
challenge card + CI-wired regression test. Integrity detectors 36 → 37.
- New skill
- Medical-AI model-engineering lane — Phase 2 (build/scaffold). Completes the
build → validate chain in-repo, staged after Phase 1's verification contract. Clinician-anchored
(a reproducible research scaffold generator that integrates MONAI / nnU-Net, not a replacement);
default CI stays torch-free.- New skill
/model-scaffold(Layer B) —scaffold.pystamps out a runnable PyTorch
segmentation training repo (configurable U-Net,dataset.py,losses.py,train.py,
evaluate.py,config.yaml,requirements.txt,REPRODUCIBILITY.md,methods_stub.md) with
the reproducibility guarantees baked in by construction: a patient-level seed-locked split
written as an auditable artifact (splits/split_assignment.csv+split_seed.txt, disjoint by
construction so it clears/model-validation'scheck_split_leakage), all-RNG seeding + cuDNN
determinism, a train-only loader, andeval()+no_grad()inference. No fabricated numbers
([VERIFY]placeholders). Skills 46 → 47. - New deterministic detector
check_training_hygiene.py(/model-scaffold) — conservative
AST linter (flag-not-prove, the training-code analogue ofcheck_generated_code): all RNGs
seeded, cuDNN deterministic,eval()+no_grad()inference, no training on a non-train split.
VerdictsSEED_INCOMPLETE/MISSING_EVAL_MODE/TRAIN_ON_NONTRAIN_SPLIT(Major),
CUDNN_NONDETERMINISTIC/EVAL_SHUFFLE(Minor). Integrity detectors 37 → 38. scaffold_challengeexecutes the build → validate chain network-free: scaffold a repo →
deterministic split matches the frozen expected + is patient-disjoint (proven inline) → passes
check_training_hygiene→ a self-skipping torch tier (forward shape + gradients + reproducible
loss when torch is installed;SKIP, never CI coverage of runnability, when absent).- New skill
/architecture-zoo(Layer D, advisory) — the choose front end of the lane: maps a
research question (task + modality / dimensionality + labelled-data scale + class imbalance) to a
paper-grounded architecture shortlist via a decision tree, then per-architecture cards with core
idea, when-to-use, medical-imaging use, reference implementation, the typical validation/experiment
setup, and the matching/model-scaffoldtemplate. Seeds the classification (ResNet / DenseNet /
EfficientNet / Inception / ViT / Swin / DeiT), segmentation (U-Net / 3-D U-Net / V-Net / Attention
& Residual U-Net / nnU-Net / SegResNet / Swin-UNETR / Mask R-CNN), and foundation/SSL (SAM / MedSAM /
MedSAM2 / TotalSegmentator / SegVol / BiomedCLIP / DINO / MAE / SimCLR / MoCo) families. Every
recommendation names its source paper; it teaches archetypes, not a live SOTA leaderboard. Skills
47 → 48.
- New skill
- Medical-AI model-engineering lane — Phase 3 (reporting). The documentation seam of the lane,
after validation (Phase 1) and build (Phase 2). Clinician-anchored, additive.- New skill
/model-card(Layer C) — generate the documentation an engineer-built model must
carry: a Model Card (Mitchell et al., FAccT 2019), a dataset Datasheet (Gebru et al.,
CACM 2021), and a METRIC-informed data-quality pass (Schwabe et al., npj Digit Med 2024),
filled from user-supplied facts — never fabricated (intended use, out-of-scope use, training data,
per-subgroup performance, caveats, provenance, consent, licence). Templates live inreferences/
and are uncounted (documentation standards, not clinical reporting checklists — same treatment
asappraisal_tools/METRICS.md), soreporting_guidelinesis unchanged. Skills 48 → 49. - New deterministic detector
check_model_card_complete.py(/model-card) — verifies every
required Model Card / Datasheet section is present and non-empty (not missing, not an unfilled
[NEEDS INPUT]placeholder). VerdictsMISSING_SECTION/EMPTY_REQUIRED_SECTION(Major); a
presence check, not a truth check.reporting_compliancefamily. Integrity detectors 38 → 39. - Reproducible challenge (
check_model_card_complete_challenge, synthetic complete + incomplete
fixtures) + CI-wired regression test (8 cases).
- New skill
- Medical-AI model-engineering lane — Phase 4 (evaluation + MLLM). The evaluation half, completing
the choose → build → validate → evaluate → report chain. Clinician-anchored, additive.- New skill
/model-evaluation(Layer B) — compute task-correct held-out metrics for a trained
imaging model (segmentation: Dice + a boundary metric HD95/NSD per structure; classification: AUROC- AUPRC + sensitivity/specificity with bootstrap CIs at the deployment prevalence; detection: FROC/
mAP with a stated IoU criterion) + calibration + subgroup slices, emitting a per-case table for
/analyze-stats.check_metric_reporting.pygates the metric choice against Metrics Reloaded
(Maier-Hein & Reinke et al., Nat Methods 2024) / CLAIM 2024 (PIXEL_ACCURACY_SEG/
NO_BOUNDARY_METRIC/ACCURACY_ONLY/DETECTION_METRIC_MISSING/CI_MISSING). data_preparation
family. Skills 49 → 50.
- AUPRC + sensitivity/specificity with bootstrap CIs at the deployment prevalence; detection: FROC/
- New skill
/mllm-eval(Layer B) — a model-agnostic (closed API or open weights) evaluation
harness for an LLM/MLLM on a clinical task (report generation, VQA, extraction): adjudicated
reference standard, clinical-efficacy metrics (RadGraph-F1 / CheXbert-F1 beyond BLEU/ROUGE),
faithfulness/hallucination, pretraining-contamination, prompt sensitivity, reader study.
check_mllm_eval_completeness.pygates the plan (NGRAM_ONLY/FAITHFULNESS_MISSING/
REFERENCE_STANDARD_MISSING/CONTAMINATION_UNADDRESSED/READER_STUDY_MISSING/ …).
reporting_compliance family. Skills 50 → 51. - New reviewer domain-probe
mllm_evaluation.md(ME0–ME8) (/peer-review+/self-review,
vendored byte-identical) — the reviewer-side audit of an LLM/MLLM clinical evaluation. Grounded in
RadCliQ (Yu et al., Patterns 2023), RadGraph (Jain et al., NeurIPS 2021), CheXbert (Smit et al.
2020), MedVH / Med-HALT, MI-CLEAR-LLM. Domain-probe modules 16 → 17. Integrity detectors 39 → 41. - Uncounted appraisal ref
appraisal_tools/METRICS_RELOADED.md(metric-selection guidance; not a
counted reporting checklist). Reproducible challenges + CI-wired regression tests for both detectors.
- New skill
MedSci Skills v4.10.0
Added
- Three new reviewer domain-probe modules (
/peer-review+/self-review, vendored
byte-identical), reverse-engineered from high-IF CC-BY papers under thereverse_engineer/
license firewall:mendelian_randomization.md(MR1–MR8: the three IV assumptions, a
pleiotropy-robust sensitivity suite rather than IVW alone, Steiger/direction, sample overlap,
non-linear-MR caution, drug-target colocalization);polygenic_risk_score.md(PG1–PG8:
ancestry transferability/portability, base/target leakage, incremental value over the clinical
model, screening detection-rate-vs-discrimination, target-population calibration);
network_meta_analysis.md(NM1–NM8: transitivity, global+local incoherence, SUCRA/P-score
over-interpretation, CINeMA/GRADE-NMA certainty, component-NMA additivity). Domain-probe modules
12 → 15. - Observational probe O17 (
observational_confounding.md) — agnostic many-exposure-scan
multiplicity (ExWAS / EWAS / MWAS): correction matched to claim against the honest test-count
denominator, independent replication as the real safeguard, correlated-exposure conservatism,
selective top-hit reporting. - Two reporting-guideline checklists (
/check-reporting): STROBE-MR (Mendelian
randomization) and PGS-RS / PRS-RS (polygenic-score risk prediction), with study-type
routing + aliases. Reporting guidelines 36 → 38. - Four
/analyze-statsanalysis guides: multiple-testing/high-dimensional screening,
Mendelian randomization, polygenic risk score, and network meta-analysis. /clean-dataimplausible-value & cross-field validity rules reference — organ-system
compatible-with-life bounds + cross-field logical-consistency rules (temporal ordering,
derived-vs-source, sex-/state-specific), flag-not-auto-fix.
Changed
- Clinician-friendly update reminders. The classroom installers
(install-macos.command/install-windows.cmd/install-windows.ps1) now enable the in-app
"update available" notice and the one-click Desktop updater by default (turnkey path; disable
with--disable-update-notifyorMEDSCI_NO_UPDATE_CHECK=1). For thenpx/manual paths the
installer prints a one-time nudge showing how to turn reminders on (--enable-update-notify),
and the README Quick Start recommends it. New read-onlyupdate.session_hook_enabled()gates the
nudge; thenpx/manual paths stay opt-in (no silent SessionStart hook).
MedSci Skills v4.9.0
Added
-
Duplicate-bibliography gate — new
check_reference_duplication.py
(/manage-refs, also usable from/sync-submission) reads the BUILT artifact
(.docxvia stdlib zipfile, or a rendered.md/.txt) and fires
DUP_REF_HEADING/REF_NUMBER_RESTART/REF_SIGNATURE_DUP(Major) when the
reference list is duplicated. Catches the hybrid failure where a manuscript
carries both inline[@key]citations and a hand-typed## Referenceslist and
is built with pandoc--citeproc: the build renders the hand list and a
citeproc bibliography (often after the legends), so the same reference appears
twice;check_xrefdoes not see it. Author-anchored(first-author, year)
signature detection works on Word auto-numbered lists. Validated against a real
built docx with the duplicate (caught) and its single-list fix (clean).
Stdlib-only; PII-free fixtures +test_reference_duplication.sh. -
Cross-script binning-consistency gate — new
check_binning_consistency.py
(/self-review, Phase 2.5b) parses analysis source (R/Python) and emits
BINNING_DRIFT(Major) when one derived categorical (age band, BMI category,
eGFR stage, risk tier) is binned with ≥2 different(breaks, right-closure)
signatures across files. The same cohort then splits differently per script:
per-stratum Ns drift between a primary table and a sensitivity table while the
grand total still reconciles, so a row-sum check passes but a stratum can
spuriously cross a threshold. Motivated by a screening cohort that binned age
right=FALSEin the primary script vsright=TRUEin a threshold sensitivity
script — fractional ages shifted hundreds of participants and produced a
spurious "reached" stratum. Stdlib-only; PII-free fixtures +
test_binning_consistency.sh.Together these two gates take the analysis-integrity detector suite 34 → 36
(citation family 6 → 7, data-preparation 5 → 6); skills and reporting guidelines
unchanged. Additive and backward-compatible. -
Float citation-order gate — new
check_citation_order.py(/self-review)
flags numbered floats not cited in ascending order of first appearance, per series
independently (main Tables, main Figures, Supplementary Tables, Supplementary
Figures). It scans only the narrative body (auto-excluding the Figure Legends /
back-matter so an in-order legends block cannot mask an out-of-order body) and
tolerates plural lists ("Tables S4, S5"), ranges, and non-float sensitivity-spec
labels ("S1–S6").CITATION_ORDER(Major) is a pre-peer-review desk/technical-check
item editorial offices "unsubmit" for;CITATION_GAP(Minor) flags non-contiguous
numbering. Motivated by a journal technical-check unsubmit where main Table 3 was
cited before Tables 1–2 and the supplementary tables were cited wildly out of order
(S4, S9, S16, S12, …). Wired into/self-review's technical-check pass; synthetic
positive/negative fixtures + regression test. Analysis-integrity detectors
33 → 34 (Reporting compliance family 8 → 9); skills 45 and reporting guidelines
36 unchanged. Additive and backward-compatible. -
Percentage-decimal style check + KJR technical-check conventions —
/self-review's
check_classical_style.pygains aPERCENT_DECIMALSverdict (Minor, report-only)
flagging percentages reported to >1 decimal place ("35.14%"), which several journals
(e.g. KJR) require at one decimal at technical check; regression fixture + test added.
The KJR journal profile (write-paperdetail +find-journalcompact) gains a
Technical-Check Conventions section enumerating the deterministic pre-review desk
items that "unsubmit" a manuscript: ascending float citation order, demographics in
Materials and Methods, one-decimal percentages, double spacing, Acknowledgments/Funding/
Author-Contributions on the Title Page only, reporting checklist cited as "Supplementary
Material 1", IRB number in Methods even when blinded, and ICMJE forms only after
acceptance. No detector-count change (existing detector extended; profiles updated, not
added). Motivated by a 2026-06 KJR technical-check unsubmit. -
Audit-dump leak gate — new
check_checklist_dump_leak.py(/sync-submission)
scans every.md/.docx/.pdfin a submission directory for the residue of a
/check-reportingor/self-reviewinternal audit report (compliance_pct,
fixable_by_ai,check_reporting_version,Auto-fix:,[PARTIAL→auto-fixed],
suggested_fix,Action Items,_pipeline_log,NON-AUTHORITATIVE). Any hit is
a P0 leak: these tooling tokens must never reach a reviewer. Motivated by a
near-miss where a prior project'sSTROBE_checklist_v4.pdfwas actually the
check-reporting dump, reused by filename and compiled into the reviewer-visible
proof (exposing auto-fix notes, raw JSON, and a stale old title). Wired into
preflight_gate.pyas a P0 check over the journal asset directory; writes
qc/checklist_dump_leak.json./check-reportingreports now also open with a
NOT-FOR-SUBMISSIONbanner so the working audit is self-identifying.
Analysis-integrity detectors 32 → 33; skills 45 and reporting guidelines 36
unchanged. Additive and backward-compatible. -
Frontmatter schema gate (Agent Skills cross-platform portability) — new
scripts/check_frontmatter_schema.py+ CI step strictlyyaml.safe_loads every
skills/*/SKILL.mdfrontmatter and enforces the published Agent Skills spec: valid
YAML,name≤64 chars / lowercase-hyphen / no reservedclaude/anthropictoken,
descriptionpresent / ≤1024 chars / no XML angle brackets. The repo's own generators
use a tolerant line-based reader, so a frontmatter block that is not valid YAML could
pass every prior gate yet be rejected by a strict-YAML consumer (the agentskills.io
directory validator or another agent platform). Self-test (tests/test_frontmatter_schema.sh)
covers each violation class. This is a repo-CI validator, not a counted detector.
Changed
-
Skill-boundary documentation — a diagnostic pass confirmed the 45 skills are
deliberately specialized (no consolidation warranted), but several boundaries were
easy to confuse. README's "Skills Work Together" now carries a Skill boundaries
block spelling out the reference pipeline (search-lit→lit-sync→manage-refs→
verify-refs), the language pass order (humanize→polish-language→academic-aio),
manuscript-type selection (write-paper/review-paper/revise), author-vs-reviewer
(self-review/peer-review), project entry (intake-project/orchestrate), study
design (design-studyperceptual ceiling gate /design-ai-benchmarking), and content
vs template (write-protocol/fill-protocol)./revisenow documents the manual
fallback when/analyze-statsor/make-figuresis unavailable (emit a checklist, hold
responses asBLOCKED — pending analysis/figure, never invent numbers). Docs only. -
/analyze-statsobservational-design precondition — Phase 2 (Analysis Plan) now opens
with a WARN-level precondition: before planning an observational analysis (cohort,
case-control, cross-sectional, registry, survey), confirm a literature-grounded
variable_operationalization.md(from/define-variables) or equivalent codebook-backed
definition table exists; if not, warn and recommend/define-variablesfirst so
exposure/outcome/covariate definitions and cutoffs are citation-backed rather than invented
ad hoc from the data dictionary. WARN, not a hard block (proceed on explicit confirmation;
stricter projects can treat it as a hard stop). Mirrors the precondition/write-protocol
already enforces before drafting Methods, closing the one observational-pipeline skill that
lacked it. Guidance only — non-breaking, no new code gate. -
/meta-analysisprogressive disclosure (token hygiene) — the two inline "Empirical
Lessons" sections (16 dated SR-MA peer-review lessons, ~45 lines) moved verbatim to
load-on-demandreferences/empirical_lessons.md, with an explicit "load before Phase 4
extraction-form design and before Phase 8 submission" pointer and aReference Files
entry — matching the skill's own established pattern (15 existing reference files). The
largest SKILL.md in the bundle drops 804 → 775 lines (less context loaded on every
activation); the lessons stay discoverable via the reference list. Content byte-preserved
(no rewrite, no renumber — a pre-existing duplicate "9." label is carried over and noted in
the reference file). No skill/detector count change. -
De-drift the
sync-submissionYAML front-matter splitter —check_wordcount_cap.py
andcover_letter_drift_check.pyeach carried their own_strip_yaml_front_matter, marked
"keep in sync" but already drifted (list vs tuple return; subtly different unclosed-fence
handling). Extracted one canonicalsplit_yaml_front_matter()into a private
scripts/_yaml_frontmatter.py(leading underscore → not counted as a detector) imported by
both — the helper ships in the same skill'sscripts/dir, so it stays self-contained when
vendored/installed. Behavior-preserving (verified normal / no-front-matter / unclosed cases- the wired
test_wordcount_capandtest_preflight_gatesubprocess-import path). No
skill/detector count change.
- the wired
Fixed
- Public-doc count reconciliation —
README.md(MedSci-Audit suite line) and
CITATION.cff(abstract) cited stale catalog totals from before the detectors above
merged (28 detectors / 32 EQUATOR guidelines). Reconciled to the disk SSOT
(metadata/catalog_counts.json): 36 analysis-integrity detectors / 36 reporting
guidelines. Added aWhat's New"Unreleased" block toREADME.mdso the public
progression no longer implies v4.8 is current. No code or count change — the SSOT was
already correct; o...
MedSci Skills v4.8.0
The review-harvest batch: deterministic detector hardening promoted from real-manuscript review
cycles — four false-positive fixes, two new gates, nine reviewer-side domain probes, and a
design-stage gate. Additive and backward-compatible — no skill, CLI, or output-path change;
skills 45 and reporting guidelines 36 unchanged; analysis-integrity detectors 30 → 32.
Added
- Reader-facing supplement / multi-file hygiene gate — new
check_supplement_hygiene.py
(/self-review) lints the rendered supplement, a separately-built tables file, and caption files
(not justmanuscript.md) for the technical-check-fatal residue that hides there:§/§Linternal
labels, unfilled placeholders (Table SX,[Authors], figure-path globs, build-dir paths), build
markers ([VERIFY]/TODO), response-to-reviewers framing, planning residue, and body↔supplement
cross-reference numbers that don't resolve.check_artifact_coverage.pygains
PROMISED_STAT_NO_VALUE+ a--supplementcorpus (a bound/ceiling/de-confounded statistic promised
but never given a number anywhere). (#187) - Power-aware null-interpretation gate — new
check_null_calibration.py(/self-review)
flags a headline negative/equivalence claim ("no synergy", "not associated") that carries no
minimum-detectable-effect, power, equivalence-margin/TOST, or CI-compatibility statement. Plus a
reusablerating_monotonicity.pytemplate (/analyze-stats) that catches a folded
confidence-weighted (call × confidence) → AUC encoding, and a/design-studydesign-stage ceiling
gate for perceptual/reader-AI studies (6 ceiling-breakers set before data lock). (#188) - Nine reviewer-side domain probes across the shared peer-review/self-review modules: SR/MA
small-k enrollment-overlap, mixed-denominator pooling, prospective-registration chronology, and
boundary-degenerate proportions (P14–P17); observational selection-on-availability and
serial-imaging lesion-tracking (O15/O16); diagnostic exclusion-flow ↔ prose + modality-safety (D8);
AI arm-task-vs-deployment-workflow (AO6); and a survival apparent-vs-optimism deterministic tell
(S7). (#186) - Integrity detector count: 30 → 32.
Fixed
- Four detector false positives that fired Major on legitimate (often recommended) patterns:
check_generated_codeno longer flags a hex-color palette (the colorblind-safe WONG palette
make-figuresrecommends) as hand-typed tabular data;check_classical_stylefires the§AI-tell
only on a section cross-reference, not on author-footnote daggers;check_scope_coherenceclears
CROSS_SECTIONAL_PROGNOSTICwhen the prognostic token sits inside a negation/deferral frame; and
check_cohort_arithmeticno longer mis-binds theRATE_BACKCALCnumerator to a tier label's digit
or a decimal's fraction. Each ships a regression fixture; three previously-unwired test suites are
now CI-wired. (#185)
Changed
- Release pipeline now also publishes to npm (idempotent, with npm provenance via OIDC), so the
npx medsci-skills@latest installchannel no longer drifts behind the GitHub release. The step runs
only when theNPM_TOKENrepo secret is set, skips if that version is already on npm (re-running a
tag is safe), and runs after the GitHub Release so an npm hiccup never blocks it. No product change.
MedSci Skills v4.7.0
The self-update foundation: physician-researchers stay current without GitHub, git, or a
terminal — via a transactional crash-safe installer, a verified one-click updater, a hardened
release pipeline, and an opt-in update notice. Additive and backward-compatible — no skill, CLI,
or output-path change; skills 45 and reporting guidelines 36 unchanged. All four pieces are
network-mocked-tested and run on Ubuntu + macOS + Windows CI.
Added
- Transactional, crash-recoverable installer + per-target state.
install.pynow installs each
target through a durable journal state machine (installers/medsci_txn.py,
prepared → old_moved → new_installed → committed, atomic-write +fsync): an interrupted install
is recovered on the next run (roll back an incomplete transaction, forward-clean a committed one,
fail closed on a corrupt journal). It keeps a per-target installed manifest at
~/.medsci-skills/targets/<target>/with a per-skill SHA-256 inventory — a skill you modified
is snapshotted to~/.medsci-skills/backups/<ts>/before an update, legacy collisions are backed up
there (never inside the skills dirs, never auto-deleted), and only MedSci-owned skills are pruned
(your/third-party skills are untouched). Adds canonical-home containment path-safety, a
disk-space preflight, two deterministic tracked manifests
(metadata/distribution_manifest.jsonownership/version +metadata/distribution_files.json
payload inventory) with a CI--checkgate, and a Windows/macOS CI matrix. (#177) - One-click self-updater (
installers/update.py). Fetches the latest classroom release and
re-installs through the transactional installer — no GitHub UI, git, or terminal. Resolves the
release viaapi.github.comonly and fails closed if the API has no sha256 digest; verifies
the download's sha256 == the API digest, the asset name, and the tag; and neverextractall()s —
it extracts per entry, rejecting path traversal (POSIX + Windows), symlink/hardlink/junction,
case-insensitive duplicates, and zip-bombs, and enforcing thedistribution_files.jsonallowlist +
per-file hash. Installs the updater to~/.medsci-skills/updater/(survives deleting the download
folder);install.py --check-updatereports availability via semver with a clock-sane 24h cache;
optional consented--desktop-launcher. Thin.command/.cmdlaunchers wrap it; a privacy notice
(docs/update_privacy.md) states the honest scope. (#178) - Release-pipeline supply-chain hardening.
release.ymlnow gates on a version-consistency check
(the pushed tag must equalCITATION.cff==package.json==metadata/distribution_manifest.json
and the tracked inventory must match the tree); injects a verifiedprovenance.json
{schema_version, tag, version, git_sha, built_at}into each classroom ZIP via
build_classroom_release.py --tag/--git-sha/--built-at; attests the ZIPs' build provenance
(actions/attest-build-provenance); runs on a protectedreleaseenvironment (required-reviewer
approval); and — via the newscripts/check_release_zip.py— verifies each ZIP round-trips through
the updater's own safe-extract + provenance validation before publishing, so a release can never
ship a ZIP the self-updater would reject (locked byinstallers/tests/test_release_zip.sh).
provenance.jsonstays a control file (excluded from the safe-extract inventory).SECURITY.md
gains a "Release integrity & revocation" section;docs/maintainer_workflow.mddocuments the
protected-environment setup. (#179) - Opt-in update notice for Claude Code (off by default).
install.py --enable-update-notify
merges a SessionStart hook (installers/session_update_check.py) into~/.claude/settings.json
that prints a one-line "update available"systemMessageat session start;--disable-update-notify
removes only that hook (keying on the home-anchored script path, so it never touches a foreign hook).
The hook does not read the SessionStart stdin (no cwd/transcript/session id), has no
telemetry/analytics/unique-id, uses the shared clock-sane 24h cache + a 4 s timeout, stays silent on
any error (never blocks a session), honorsMEDSCI_NO_UPDATE_CHECK=1, and installs nothing — it
only notifies. A version check resolves the latest tag without the OS-specific download asset
(resolve_latest_tag), so the notice works on Linux too; the settings merge is idempotent, preserves
foreign hooks/settings, and refuses to clobber an unparseablesettings.json. Tested offline
(installers/tests/test_session_hook.py, 38 cases). (#180)
Trust boundary (honest scope)
- Running a release's bundled installer is remote code execution within the GitHub trust boundary.
The digest and the build-provenance attestation detect transport / asset tampering — they do
not defend against a compromised publisher account or a malicious official release. See
SECURITY.mdanddocs/update_privacy.md.
MedSci Skills v4.6.0
A maintainability, governance, and review-depth release. Integrity detectors 28 → 30; domain probes 11 → 12; skills 45 and reporting guidelines 36 unchanged. No skill rename, CLI, or output-path change — additive and backward-compatible.
Added
- Fairness / equity / subgroup-performance domain probe (
equity_fairness.md, EQ0–EQ6). Vendored byte-identical into/peer-reviewand/self-review(MODULES11 → 12). Fires only when a manuscript claims cross-population performance or presents subgroup analyses as a fairness argument: disaggregated subgroup metrics (not aggregate-only), error-rate-vs-discrimination parity and base-rate dependence, a named fairness estimand + between-group gap test, development-cohort representativeness, subgroup EPV/power, and equity-aware framing aligned to TRIPOD+AI / DECIDE-AI / CONSORT-AI. (#170) - AI-disclosure + data/code-availability detector (
sync-submission/check_disclosure_availability.py). An AI-use disclosure must carry four tokens — version + access channel + date/date-range + responsible party (the tool name only triggers the check) — plus Data/Code Availability presence with a repository/DOI where the journal expects one, keyed byjournal_availability_policy.json. (#171) - Structured-summary-box conformance detector (
academic-aio/check_summary_box.py). Key Points bullet count + one-claim-per-bullet, Research-in-context's three sub-blocks, and plain-language word band, journal-keyed viasummary_box_specs.json— catches the wrong-format box a production technical check rejects. (#171) - Skill
maturitytaxonomy (official/experimental/community). A required, additiveskill.ymlv2.2 field (schema_versionstays 2), enforced byvalidate_skill_contracts.pyand surfaced inskills_catalog.json; all 45 current skills areofficial. (#174) - Governance & answer-engine docs:
ROADMAP.md(priorities + explicit out-of-scope),MAINTAINERS.md(clinical authority stays with the founder),SECURITY.md(vulnerability reporting + medical-scope boundary),docs/maintainer_workflow.md(review + release checklist),docs/faq.md(AEO/GEO), and two new issue templates (installation problem, detector request). (#173)
Changed
- Positioning leads with the compliance moat. README hero subline and the marketplace source description (
MARKETPLACE_DESCRIPTION) now lead with reporting-guideline + risk-of-bias compliance, reference verification, and deterministic integrity gates rather than skill count. README gains a "What is MedSci Skills?" answer block, a "Start here: 3 workflows" section, and a "Validation status" section (available vs CI-gated vs E1-evaluated). A stale "32 EQUATOR" hero count was corrected to "36 reporting guidelines and risk-of-bias tools". (#173, #174) write-paperPhase 7 token diet (pilot). The three integrity-audit sub-steps (7.3a/7.3b/7.3c) moved toreferences/phase7_integrity_audits.mdbehind a control-flow-preserving pointer; measured −10,238 chars (~2,559 tokens) per invocation, loaded on demand only when Phase 7 runs. (#172)
Documentation
CONTRIBUTING.mdand the PR template add a medical-claim → founder-review gate and an official/experimental/community classification line;IMPACT.mdadds an "Interpretation of metrics" caveat block ("early community interest, not widespread adoption"). (#173)
Validation / Evidence
- New deterministic scripts each ship a network-free challenge/regression test wired into CI.
MEDSCI_AUDIT.mddetector-count claims corrected (it had drifted to 27/28) and aDETECTOR_CLAIM_FILESgate added tovalidate_catalog_consistency.py(anchored current-total patterns, never historical evaluation numbers) so the total cannot silently drift again. A regression test for the routing-asset gate (tests/test_routing_assets.sh) covers the references/ pointer that guards the Phase-7 extraction. (#169, #171)
MedSci Skills v4.5.0
Added
-
Self-review domain-probe batch (SR/MA + DTA + prediction-model) + submission asset-anon abs-path gate. Five new review probes promoted from field cycles, plus one deterministic submission check.
sr_ma.md: P12 risk-of-bias table row-sum ↔ figure-matrix reconciliation (each NOS ★/JBI Y row must equal its printed total; the traffic-light figure's data matrix must match the supplementary table; SSOT = the primary appraisal form, not a plotting-script constant) and P13 included-study ↔ reference-list completeness (every characteristics-table study must be a numbered reference; source citations from PubMedefetch, not hand-kept notes; disambiguate same author/year by technique + sample size).diagnostic_accuracy.md: D7 index-test-as-enrollment-criterion circularity (escalate past Major when an inclusion threshold is the index test under study).clinical_prediction_model.md: CP5 intended-use horizon leakage (claim-timepoint adjectives vs each predictor's availability timepoint) and CP6 validation-nomenclature conflation (development/CV vs held-out/external test). Probes are vendored byte-identical topeer-review.sync-submission/scripts/check_asset_anonymization.py: new scan class 4 — aword/*.xmlattribute (e.g. a pandoc-embedded image's<pic:cNvPr descr="…">) carrying an absolute home-dir path (/Users/…,/home/…) is a username leak invisible to a rendered-text scan; flagged asdocx_embedded_abs_path(leak severity), with a regression test fixture. No version bump — probe/reference + detector additions. -
/clean-data+/analyze-stats— reverse-coded-item / negative-alpha detector (integrity detectors 27 → 28). A multi-item Likert scale with a negatively-worded item must recode it(min+max) - xbefore the scale total or Cronbach's alpha is computed; left un-recoded, the item correlates negatively with the rest of the scale and alpha collapses (often negative). A negative alpha is a coding bug, not a "multidimensional construct" — defending it as such loses a review round. New stdlib-onlyskills/clean-data/scripts/check_reverse_coding.pycomputes per-item corrected item-total (item-rest) correlations + the raw Cronbach's alpha and returnsREVERSE_CODING_LIKELY(alpha < 0) /REVERSE_CODING_SUSPECT(negative item-rest, alpha ≥ 0) /OK, exit 1 under--strict.skills/analyze-stats/references/templates/likert_summary.pyis hardened to print item-rest correlations, flag negative ones as reverse-code suspects, warn loudly on a negative alpha, and apply the recode via a new--reverse-itemsflag before scoring/alpha. Ships a synthetic fixture (a 3-item scale with one reverse item → raw α = −1.71, plus a clean aligned scale) + CI-wired regression test (skills/clean-data/tests/test_reverse_coding.sh). Detector mapped to thedata_preparationfamily;metadata/detectors_catalog.jsonregenerated;catalog_counts.json::integrity_detectors27 → 28. Motivation: a medical-education pilot whose Trust scale shipped at α = −0.57 (one reverse item un-recoded) and consumed a major-revision round before6 - xrestored α = 0.58. -
Test backfill (cont.) —
fill-protocol+fulltext-retrievalregression tests (Tier 1 complete).skills/fill-protocol/tests/test_fill_form.shbuilds a synthetic Word template at runtime (python-docx: 2-column key/value table + numbered section headers + title paragraph), runsfill_form.pywith a content YAML exercisingtable_kv/section_replace/paragraph_replace, and asserts the values landed in the reopened docx, the title placeholder was replaced, and an absent label is reported[MISS]— no committed binary fixture.skills/fulltext-retrieval/tests/test_pdf_to_md.pystubspymupdf4llmbefore import (the module exits on a missing dep) and pins the dependency-free helpersparse_page_range(ranges/lists/whitespace) andclean_markdown(collapse 4+ newlines, rstrip lines, single trailing newline, idempotent) — no heavy PyMuPDF dependency added to CI. Both use deps already present (python-docx/pyyaml; stdlib). No skill/version change — test infrastructure only. -
Test backfill (cont.) —
fill-icmje-coi+academic-aioregression tests. Three more deterministic, network-free tests wired into CI.skills/fill-icmje-coi/tests/test_fill_icmje_coi.shclones the shipped synthetic seed for two authors and asserts the documented contract per output docx (14 checked boxes, 13 "None" disclosures, new title/date substituted, author name present, zero placeholder leakage; stdlib zipfile path).skills/academic-aio/tests/test_validate_schema.shchecks the JSON-LD validator (valid ScholarlyArticle passes; wrong@context, unknown@type, missing required field, malformed DOI each fail).skills/academic-aio/tests/test_batch_metadata_audit.shchecks the repo/HF-card auditor (clean repo passes--fail-on-issue; missing README/CITATION/LICENSE fails; report-only mode stays exit 0; a PHI-shaped string in an HF card is flagged). All fixtures synthetic. No skill/version change — test infrastructure only. -
Test backfill — Tier 0 CI-wiring +
deidentifyPHI-scan regression test. Ten skill regression tests that existed on disk but were never gated are now wired into.github/workflows/validate.yml, so a silent break fails CI:make-figures(legend reconcile),clean-data(structural-zero),lit-sync(poll logic),meta-analysis(pool consistency),generate-codebook,present-paper(speaker-notes markdown),version-dataset(manifest/verify),manage-refs(vN-docx cross-ref), andpolish-language(consistency-linter challenge). Newskills/deidentify/tests/test_deidentify_scan.shasserts the exact PHI-classification contract (PHI/REVIEW_NEEDED/SAFE counts +rrnphi_type) on the three committed fixtures — the CSV scan path is stdlib-only and network-free, and the test file is Hangul-free (column-specific asserts read the fixture header at runtime). CI now installs pandas/numpy/python-pptx/python-docx up front (was: pandas installed after the gates, which would silently skip the dep-guarded tests);version-datasetgains a pandas skip-guard for local robustness. No skill/version change — test infrastructure only.
MedSci Skills v4.4.0
Added
/peer-review+/self-review— Image-Synthesis / Cross-Modality Generation probe module (IS1–IS4) + reviewer-side reference-integrity spot-check. New domain-probe moduleimage_synthesis.md(vendored byte-identical into/self-review;MODULES10 → 11, sync gate updated) for studies that synthesize one imaging modality from another (MRI→PET / MRI→CT / non-contrast→contrast / low-dose→full-dose) and claim the output carries functional/molecular information or substitutes for the unavailable target. IS1 determinism/information-ceiling (the synthetic image is a deterministic function of the source, so a same-reader "source + synthetic > source alone" gain is a presentation/interpretability effect absent a direct source→label baseline); IS2 target-derived-preprocessing / undescribed slice-selection leakage (a lesion mask drawn on the target modality guiding slice selection or training makes "function inferred from structure" circular — undescribed provenance is itself a Major #1 candidate); IS3 global-vs-lesion-level quantitative agreement (whole-organ SUVR agreement does not establish lesion-level fidelity); IS4 mechanistic/proxy-signal plausibility (name what the source physically measures vs the target's biology — high image similarity is not evidence an unmeasured signal was recovered). Routed from a new peer-review Phase 2K + Phase 3 QC item 15 + Phase 5 routing line, and a/self-reviewrouting-table row. Per Phase 2F, IS2/IS4 are typically unfixable-in-current-form and govern the recommendation toward Reject-leaning. Companion reviewer-side reference-integrity spot-check added to the Phase 2 issue checklist + Phase 3 QC item 16 (all original-research reviews): spot-check the load-bearing Introduction/Discussion citations used as evidence the method/premise works — a paper cited for a different task, a duplicate reference, a wrong year/author — phrasing unconfirmed suspicions "please verify" (the reviewer-side mirror of the authoring citation-safety discipline). Motivation: a decision-audit of a cross-modality MRI→synthetic-PET reader-study review where the three structurally distinct synthesis failure modes were split across reviewers and the reference-list errors went uncaught on the reviewer side./author-strategy— trajectory-archetype classification (optional, explainable multi-label heuristic). Adds an opt-in capability that classifies a queried author's PubMed trajectory into abstract career archetypes (A1 infrastructure builder, A2 methodology rule-maker, A3 clinical→AI hybrid, A4 SR/MA volume engine, A5 large-consortium participation pattern, A6 clinical-subspecialty device/technique depth, plus a computed A3+A6 composite). The rubric is a single canonical data file (references/trajectory_archetypes.yaml); the narrativereferences/trajectory_archetypes.mdis generated from it byrender_archetype_doc.py(--checkgate). Each label carries a 0–1 score (computable-signal-weight denominator;unavailablesignals — h-index/citation/venue-tier — are excluded and surfaced as[VERIFY], never fabricated), a confidence band capped per archetype, and evidence drawn from the author's own PMIDs (evidence_pmidsfor per-paper signals,evidence_summaryfor corpus-level); a negative rule suppresses a label toinsufficient evidence. A disambiguation gate precedes classification:fetch_pubmed.pywrites acorpus_manifest.jsoncryptographically bound to the CSV (csv_sha256+pmid_set_hash) andclassify_archetypes.pyrefuses to run unlessreview_status: approvedand the hashes match — a surname alone never resolves an author, and--approveis a human gate. Target-author attribution (ORCID/affiliation/initials/position) is split into a stdlib-onlypubmed_parse.pyand never borrows a co-author's metadata on a same-surname collision; author position is reported as afirst/middle/last/unknownpositional heuristic (not leadership metadata), andanalyze_patterns.py's "Leadership rate" is renamed "First/last positional rate". The output header states the labels are explainable heuristics, not objective classifications. Ships name-free synthetic fixtures + a CI-gated regression test (A14). Skill count unchanged — an enhancement, not a new skill./verify-refs— OpenAlex tertiary index (conference-proceedings / non-DOI recovery). PubMed covers only biomedical literature and CrossRef's proceedings coverage is uneven, so NeurIPS / ICLR / ACL-style citations — common in medical-AI manuscripts — fall through both and were markedUNVERIFIED. After the PubMed and CrossRef tiers,verify_refs.pynow consults OpenAlex (https://api.openalex.org, free, no API key) only when no authoritative author list was obtained yet (a reference already resolved by PubMed/CrossRef incurs no extra call). It resolves by DOI when present, otherwise by a token-similarity-guarded title search so a fabricated title cannot earn a spuriousOK. This is the free analogue of the second index (e.g. Scopus) that journal portals run alongside CrossRef. Because OpenAlex display names carry no structured family/given field and mixFirst LastwithLast, Firstforms, OpenAlex-sourced authors support an existence check plus a tolerant first-author membership check but never drive the strict positional or author-count MISMATCH (reserved for PubMed efetch / CrossRef); an OpenAlex miss isUNVERIFIED, neverFABRICATED. New--no-openalexflag restricts verification to PubMed + CrossRef. Ships a network-free regression test (tests/test_openalex_tier.sh, monkeypatchedhttp_json, CI gate A8b). Motivation: a medical-AI reference list where two NeurIPS citations validated on Scopus but not CrossRef in a journal portal's reference check.
MedSci Skills v4.3.0
Added
- Observational / cohort probe + gate hardening (sourced from two cross-sectional health-screening cohort self-review→revise loops). Expands
observational_confounding.mdO1–O6 → O1–O9 (vendored byte-identical into/self-review): O7 — over-adjustment (conditioning on a mediator or consequence of the outcome — the opposite-direction failure to O1, e.g. a renally-excreted lab in an eGFR model; "adjust for everything that differs in Table 1" is not a confounder-selection rule), O8 — analysis unit & clustering (records vs unique subjects → anti-conservative CIs), O9 — outcome construct validity for report-/registry-derived outcomes (composite homogeneity, ascertainment/κ, dictionary-first label provenance, misclassification direction). O1 also gains an exposure-defining-covariate exemption for guideline-defined exposures and a reference-arm-contamination-vs-selection-bias note (O3);check_confounding_completeness.pynow computes SMD from per-stratum mean ± SD when the wide Table 1 carries no p / SMD column (interop with/analyze-stats). - New domain-probe module
clinical_prediction_model.md(CP1–CP4) for cross-sectional / observational prediction models (TRIPOD / TRIPOD+AI nested predictor-set comparisons): apparent-vs-optimism-corrected calibration/DCA, the incremental-value-vs-marginal-effect two-null distinction, EPV per nested model, and net benefit as a model comparison (not a policy endorsement). Vendored byte-identical into/self-review;MODULES9 → 10; routed from peer-review (new Phase 2E-2) and self-review. Plus two/self-reviewexemplar_findings/(over_adjustment_collider.md,prediction_two_null_conflation.md). - Cohort-analysis probes (G39–G41).
survival_prognostic.mdgains S9 — panel-data / multistate variance (occupancy/intensity CIs must be person-clustered or person-bootstrapped, not naive model-based on within-person-correlated visit transitions; S1–S8 → S1–S9).observational_confounding.mdgains O10 — overlapping-subset gradient (an effect-size gradient across nested/overlapping cohorts is attributable by construction; inferential "attenuated/accounted-for" language needs a difference/interaction test; O1–O9 → O1–O10). Both vendored byte-identical into/self-review. Plus an extended-adjustment missingness-frame discipline (compare adjusted vs unadjusted on the same reduced complete-case frame, not the full-frame anchor) in/self-reviewPhase 2.5e +/analyze-statsover-adjustment guidance. - Cross-sectional survey-epidemiology probes (G45–G46, paper-driven from CC-BY NHANES cohorts).
observational_confounding.mdgains O11 — complex-survey design & weighting (NHANES/KNHANES/CHNS: design-based estimation with the correct/scaled weight + stratification + PSU, subpopulation-domain-not-row-deletion, weighted total is a population estimate not a sample n, design-effect/effective-n) and O12 — data-driven threshold / non-linearity mining (a recursive-search 'inflection point' / 'saturation effect' needs a breakpoint CI + pre-specified non-linearity test + stability check, not a quoted cutoff). O1–O10 → O1–O12, vendored byte-identical into/self-review./analyze-statssurvey_weighted.mdgains a subpopulation-domain (never row-delete) + survey-reporting-errors block. - Cross-sectional mediation probe (G47, paper-driven from CC-BY mediation papers).
observational_confounding.mdgains O13 — cross-sectional mediation (temporal order & sequential ignorability): a Baron–Kenny / Sobel / PROCESS / bootstrapped indirect-effect chain estimated on single-timepoint data cannot establish the X→M→Y sequence (the bootstrap CI addresses sampling variability, not identification); needs an unmeasured-mediator–outcome-confounding sensitivity analysis (e.g. an E-value for the indirect effect) + a temporal-order caveat, and proportion-mediated is unstable when the total effect is small. O1–O12 → O1–O13, vendored byte-identical into/self-review; addsexemplar_findings/cross_sectional_mediation.md. - Cleanup batch (G48/G42/G43).
/analyze-statsgains a mediation analysis guide (analysis_guides/mediation.md+ SKILL entry): bootstrapped a×b indirect effect, proportion-mediated only with uncertainty, AGReMA reporting, and the discipline that identification (no unmeasured mediator–outcome confounding → E-value for the indirect effect) — not the bootstrap — is the issue (pairs O13)./sync-submissiongainsscripts/assemble_supplement.py(NOT an integrity detector): validates anS{N}_*.md+ index supplement (index↔file 1:1, duplicate/skipped sub-section numbers), rebuilds_combined.mdin index order, and reports main-text callout coverage./render-pdf-docgainsscripts/scan_glyph_coverage.py+ a Step 3.5 pre-render scan for the xelatex silent-glyph-drop failure (arrows / − ≤ ≥ ± √ / Greek / ★ ✓ / CJK; optionalfonttoolscmap check). Both ship fixtures + CI-wired tests (A12/A13). Integrity-detector count unchanged (27). - Interaction-scale probe (G49, paper-driven from CC-BY joint-effect papers).
observational_confounding.mdgains O14 — interaction scale (additive vs multiplicative): a synergy / joint-effect / effect-modification claim is an additive-scale statement and needs RERI / AP / synergy index with CIs, not a multiplicative-only OR product term, joint-category ORs, or stratified-only estimates (the difference-in-significance fallacy). O1–O13 → O1–O14, vendored byte-identical into/self-review;/analyze-statsgains an Interaction & Effect-Modification entry (RERI/AP/S, Knol & VanderWeele). The cross-sectional-cohort review lane (O1–O14 + CP1–CP4 + S9 + gates) is now comprehensive. check_cohort_arithmetic.py— newANALYSIS_UNIT_UNDISCLOSEDcheck (--id-col, auto-detect with a cardinality guard): when records > unique subjects and the manuscript discloses neither the analysis unit nor a one-record-per-subject sensitivity, emits a Major with arecords / unique_subjects / repeat_subjects / max_visitsreconciliation (probe O8).check_scope_coherence.py— newCROSS_SECTIONAL_YIELD_LANGUAGElexicon (Minor): a cross-sectional / prevalence design using incidence-flavored vocabulary ("yield", "detection rate", "number-needed-to-screen/image", "rescreen interval") without defining "yield" once as cross-sectional report-positive prevalence.- New detector
check_paren_spans.py(/self-review, integrity detectors 26 → 27, family Style & review-process) — a post em-dash→paren-conversion safety scan (cohort-cycle follow-up): a bulk— X —→(X)edit can pair two unrelated dashes across a sentence boundary and wrap a whole sentence — or an ordinal limitation ("Sixth, …") — inside one parenthesis, paren-balanced so a balance check misses it. FlagsPAREN_SPAN_ORDINALandPAREN_SPAN_SENTENCE(long spans only, so short legitimate parentheticals like "(Dr. Smith)", "(Fig. 2)", "(95% CI …)" are clean). Wired into/self-review--fixpost-edit and/humanizepattern 13. Fixtures + regression test (CI-gated). - New detector
check_wordcount_cap.py(/sync-submission, integrity detectors 25 → 26, family Reporting compliance) — the revision-inflation trap: a revise loop monotonically adds words and silently breaches the target journal's body cap. Counts the body (Introduction → Discussion, skipping abstract/refs/tables/declarations), compares to a cap from--limitor a parsed--journal-profilearticle-type line, and emitsWORDCOUNT_OVER_CAP(Major) /WORDCOUNT_NEAR_CAP(Minor, >0.95×). The binding number is the rendered count (citeproc expands[@key]), so it prefers--rendered-words Nand otherwise estimates from the markdown body + inline-citation expansion. Wired as/sync-submissionGate 13, a/reviseexit gate (re-run after every pass), and a/self-review§F check. Ships fixtures + regression test.
Fixed
verify_refs.py— corporate/collective-author render-abort fix (cohort-cycle follow-up). A guideline body double-braced in BibTeX ({{EASL} and {EASD}},{{KDIGO CKD Work Group}}) or returned by PubMed as<CollectiveName>tripped the first-author cross-check as MISMATCH, which abortedrender_pandoc.shon every guideline-citing cohort manuscript. Corporate authors are now detected (surviving brace /<CollectiveName>/ organization keyword) and exempted from the personal-name family cross-check (annotatedcorporate/collective author, never MISMATCH). Personal-author entries are unaffected.check_classical_style.py— em-dash counter counts prose only (cohort-cycle follow-up). It excludes structural dashes — markdown table cells (incl. "—" N/A placeholders and(A) —panel-label captions), ORCID separators, and author/affiliation lines — and reports prose-vs-structural separately, so a cohort manuscript with large baseline tables is not pushed into destructive edits on correct table dashes.check_confounding_completeness.py— DB-column-code ↔ prose alias map. A DB-exported Table 1 carrying column codes (he_sbp,b_uric,b_chol_hdl) was false-flagged as imbalanced-and-unadjusted when the adjustment set was written in prose ("systolic blood pressure"). An alias map now resolves both to a shared concept; it only ever adds matches (no new false ✓). Genuinely unadjusted covariates still flag.check_confounding_completeness.py— exposure-defining-covariate exemption (O1 false-positive on guideline-defined exposures). For a guideline-defined exposure (MASLD / metabolic syndrome / CKM / sarcopenia / frailty), the components of its own diagnostic criteria (BMI, glycaemia, lipids, BP) are imbalanced by construction and correctly unadjusted — the gate flagged each as a Major. New--exposure-defining-list/-filemarks theseEXPOSURE_DEFINING_EXEMPT(adjusting for them is over-adjustment, probe O7), so the Major remains only for genuine non-d...