Releases · Aperivue/medsci-skills

28 Jun 13:54

github-actions

v5.0.0

f332681

MedSci Skills v5.0.0 Latest

Latest

Changed

v5.0.0 — storefront repositioning for the medical-AI model-engineering lane. A material
distribution change, not a label bump: the model-engineering lane (built additively across
v4.x Phases 1–4 plus the Phase 5 breadth below) now has its own storefront home and the repo's
identity is widened to cover it.
- New model_engineering storefront category ("Model Engineering & Validation") and
  medsci-modeling marketplace plugin, carved out of "Data & Study Design" (medsci-data).
  The 6 lane skills — architecture-zoo, model-scaffold, model-validation, model-card,
  model-evaluation, mllm-eval — now group under their own catalog filter and installable
  plugin (/plugin now lists nine category plugins). Both catalog generators
  (gen_skills_catalog_json.py category mapping/order, gen_marketplace_json.py plugin
  name/description) and their self-tests cover the new category.
- README + ROADMAP repositioned to the end-to-end identity: MedSci Skills is an end-to-end
  research tool for physician and medical-engineering researchers to design → scaffold →
  validate → publish — for the clinical manuscript and the medical-AI model alike. "Clinical AI
  model research engineering is in scope" is now explicit, while "not a general AI-scientist
  platform" (and not a diagnostic tool or autonomous author) is kept; the lane integrates
  MONAI / nnU-Net and never reimplements them or runs anything autonomously.
- Counts unchanged (51 skills / 41 detectors / 38 reporting guidelines); CI stays torch-free.

Added

Medical-AI model-engineering lane — Phase 5 (build-lane breadth). Expands the existing
/model-scaffold and /architecture-zoo skills; no new skills/detectors/probes (counts
unchanged: 51 skills / 41 detectors / 38 guidelines), torch-free CI.
- /model-scaffold now generates 5 task types (was segmentation-only): --task
  segmentation (U-Net), classification (small multi-label CNN; swap in a timm
  backbone), detection (torchvision Faster R-CNN + FPN), synthesis (Pix2Pix U-Net
  generator + PatchGAN), ssl (SimCLR encoder + projection head, NT-Xent). Every task keeps
  the reproducibility guarantees by construction — the patient-level seed-locked split is
  task-independent, and each emitted train.py / evaluate.py passes check_training_hygiene
  (all RNGs seeded, cuDNN deterministic, train-only loader, eval() + no_grad()). The
  challenge + regression test now verify all 5 tasks (split + hygiene + valid Python, network-free).
- /architecture-zoo adds the detection.md and synthesis.md family cards (R-CNN family /
  Faster R-CNN+FPN / Mask R-CNN / RetinaNet / YOLO / DETR; Pix2Pix / CycleGAN / SPADE / diffusion
  / VAE / fastMRI), each with the source paper, when-to-use, medical use, reference implementation,
  validation setup, and matching scaffold template; the decision-tree index now routes to them.

Assets 4

28 Jun 12:58

github-actions

v4.11.0

fba1394

MedSci Skills v4.11.0

Added

find-journal: acceptance-feasibility axis. A Phase 2.5 pre-flight
(assess_acceptance_readiness.py, deterministic + reproducible challenge card)
scans a manuscript for design-ceiling / unfixable-defect / importance-risk /
claim-mismatch signals and a ceiling verdict (advisory risk band, never a
probability). Adds two-axis ranking (scope fit × acceptance feasibility) with
explicit mismatch surfacing, an Acceptance Signals profile schema
(references/acceptance_signals_schema.md, populated for European Radiology, AJR,
KJR, RYAI, Investigative Radiology), a reject-fallback cascade plan, and a
desk-reject vs post-review distinction in Post-Rejection Mode. Helper named
assess_* (not a detector-catalog member); counts unchanged (additive). (#215)
Medical-AI model-engineering lane — Phase 1 (validation MVP). First slice of the v5.0
"design → scaffold → validate → publish medical-AI model research" lane, led by the
validation/reporting half (the build/scaffold half lands in a later phase). Clinician-anchored,
torch-free, additive.
- New skill /model-validation (Layer D, advisory + deterministic audit) — design or audit
  the clinical-validation study for an engineer-built medical-imaging model (segmentation /
  classification / detection): patient-level split disjointness + the data-leakage taxonomy,
  tuning-on-test, internal vs genuine external validation, comparator design, single-run vs
  multi-seed variance, task-correct metric selection (Metrics Reloaded), test-set sizing handoff
  to /calc-sample-size, and CLAIM 2024 / TRIPOD+AI / STARD-AI reporting fit. Integrates with
  MONAI / nnU-Net — does not replace them. Skills 45 → 46.
- New reviewer domain-probe model_development.md (MD0–MD8) (/peer-review + /self-review,
  vendored byte-identical) — partition/leakage mechanics, tuning/threshold/model-selection on the
  test set, the internal-vs-external-validation conflation, seed/run variance, test-set event
  count, metric selection, reproducibility/provenance, and reference-standard/label quality.
  Domain-probe modules 15 → 16. Grounded in the leakage taxonomy (Kapoor & Narayanan, Patterns
  2023), Varoquaux & Cheplygina (npj Digit Med 2022), CLAIM 2024, and Metrics Reloaded
  (Maier-Hein & Reinke et al., Nat Methods 2024).
- New deterministic detector check_split_leakage.py (/model-validation) — proves (by set
  arithmetic on the emitted split_assignment.csv, not heuristics) that no patient crosses
  train/val/test, and that the split records a reproducible seed. Verdicts PATIENT_OVERLAP
  (Major), MISSING_SEED (Major), SINGLE_PARTITION (Minor); train/validation/holdout synonyms
  collapse so a labelling variant never trips it. Stdlib-only, network-free, with a reproducible
  challenge card + CI-wired regression test. Integrity detectors 36 → 37.
Medical-AI model-engineering lane — Phase 2 (build/scaffold). Completes the
build → validate chain in-repo, staged after Phase 1's verification contract. Clinician-anchored
(a reproducible research scaffold generator that integrates MONAI / nnU-Net, not a replacement);
default CI stays torch-free.
- New skill /model-scaffold (Layer B) — scaffold.py stamps out a runnable PyTorch
  segmentation training repo (configurable U-Net, dataset.py, losses.py, train.py,
  evaluate.py, config.yaml, requirements.txt, REPRODUCIBILITY.md, methods_stub.md) with
  the reproducibility guarantees baked in by construction: a patient-level seed-locked split
  written as an auditable artifact (splits/split_assignment.csv + split_seed.txt, disjoint by
  construction so it clears /model-validation's check_split_leakage), all-RNG seeding + cuDNN
  determinism, a train-only loader, and eval() + no_grad() inference. No fabricated numbers
  ([VERIFY] placeholders). Skills 46 → 47.
- New deterministic detector check_training_hygiene.py (/model-scaffold) — conservative
  AST linter (flag-not-prove, the training-code analogue of check_generated_code): all RNGs
  seeded, cuDNN deterministic, eval() + no_grad() inference, no training on a non-train split.
  Verdicts SEED_INCOMPLETE / MISSING_EVAL_MODE / TRAIN_ON_NONTRAIN_SPLIT (Major),
  CUDNN_NONDETERMINISTIC / EVAL_SHUFFLE (Minor). Integrity detectors 37 → 38.
- scaffold_challenge executes the build → validate chain network-free: scaffold a repo →
  deterministic split matches the frozen expected + is patient-disjoint (proven inline) → passes
  check_training_hygiene → a self-skipping torch tier (forward shape + gradients + reproducible
  loss when torch is installed; SKIP, never CI coverage of runnability, when absent).
- New skill /architecture-zoo (Layer D, advisory) — the choose front end of the lane: maps a
  research question (task + modality / dimensionality + labelled-data scale + class imbalance) to a
  paper-grounded architecture shortlist via a decision tree, then per-architecture cards with core
  idea, when-to-use, medical-imaging use, reference implementation, the typical validation/experiment
  setup, and the matching /model-scaffold template. Seeds the classification (ResNet / DenseNet /
  EfficientNet / Inception / ViT / Swin / DeiT), segmentation (U-Net / 3-D U-Net / V-Net / Attention
  & Residual U-Net / nnU-Net / SegResNet / Swin-UNETR / Mask R-CNN), and foundation/SSL (SAM / MedSAM /
  MedSAM2 / TotalSegmentator / SegVol / BiomedCLIP / DINO / MAE / SimCLR / MoCo) families. Every
  recommendation names its source paper; it teaches archetypes, not a live SOTA leaderboard. Skills
  47 → 48.
Medical-AI model-engineering lane — Phase 3 (reporting). The documentation seam of the lane,
after validation (Phase 1) and build (Phase 2). Clinician-anchored, additive.
- New skill /model-card (Layer C) — generate the documentation an engineer-built model must
  carry: a Model Card (Mitchell et al., FAccT 2019), a dataset Datasheet (Gebru et al.,
  CACM 2021), and a METRIC-informed data-quality pass (Schwabe et al., npj Digit Med 2024),
  filled from user-supplied facts — never fabricated (intended use, out-of-scope use, training data,
  per-subgroup performance, caveats, provenance, consent, licence). Templates live in references/
  and are uncounted (documentation standards, not clinical reporting checklists — same treatment
  as appraisal_tools/METRICS.md), so reporting_guidelines is unchanged. Skills 48 → 49.
- New deterministic detector check_model_card_complete.py (/model-card) — verifies every
  required Model Card / Datasheet section is present and non-empty (not missing, not an unfilled
  [NEEDS INPUT] placeholder). Verdicts MISSING_SECTION / EMPTY_REQUIRED_SECTION (Major); a
  presence check, not a truth check. reporting_compliance family. Integrity detectors 38 → 39.
- Reproducible challenge (check_model_card_complete_challenge, synthetic complete + incomplete
  fixtures) + CI-wired regression test (8 cases).
Medical-AI model-engineering lane — Phase 4 (evaluation + MLLM). The evaluation half, completing
the choose → build → validate → evaluate → report chain. Clinician-anchored, additive.
- New skill /model-evaluation (Layer B) — compute task-correct held-out metrics for a trained
  imaging model (segmentation: Dice + a boundary metric HD95/NSD per structure; classification: AUROC
  - AUPRC + sensitivity/specificity with bootstrap CIs at the deployment prevalence; detection: FROC/
    mAP with a stated IoU criterion) + calibration + subgroup slices, emitting a per-case table for
    /analyze-stats. check_metric_reporting.py gates the metric choice against Metrics Reloaded
    (Maier-Hein & Reinke et al., Nat Methods 2024) / CLAIM 2024 (PIXEL_ACCURACY_SEG /
    NO_BOUNDARY_METRIC / ACCURACY_ONLY / DETECTION_METRIC_MISSING / CI_MISSING). data_preparation
    family. Skills 49 → 50.
- New skill /mllm-eval (Layer B) — a model-agnostic (closed API or open weights) evaluation
  harness for an LLM/MLLM on a clinical task (report generation, VQA, extraction): adjudicated
  reference standard, clinical-efficacy metrics (RadGraph-F1 / CheXbert-F1 beyond BLEU/ROUGE),
  faithfulness/hallucination, pretraining-contamination, prompt sensitivity, reader study.
  check_mllm_eval_completeness.py gates the plan (NGRAM_ONLY / FAITHFULNESS_MISSING /
  REFERENCE_STANDARD_MISSING / CONTAMINATION_UNADDRESSED / READER_STUDY_MISSING / …).
  reporting_compliance family. Skills 50 → 51.
- New reviewer domain-probe mllm_evaluation.md (ME0–ME8) (/peer-review + /self-review,
  vendored byte-identical) — the reviewer-side audit of an LLM/MLLM clinical evaluation. Grounded in
  RadCliQ (Yu et al., Patterns 2023), RadGraph (Jain et al., NeurIPS 2021), CheXbert (Smit et al.
  2020), MedVH / Med-HALT, MI-CLEAR-LLM. Domain-probe modules 16 → 17. Integrity detectors 39 → 41.
- Uncounted appraisal ref appraisal_tools/METRICS_RELOADED.md (metric-selection guidance; not a
  counted reporting checklist). Reproducible challenges + CI-wired regression tests for both detectors.

Assets 4

28 Jun 09:24

github-actions

v4.10.0

32294e3

MedSci Skills v4.10.0

Added

Three new reviewer domain-probe modules (/peer-review + /self-review, vendored
byte-identical), reverse-engineered from high-IF CC-BY papers under the reverse_engineer/
license firewall: mendelian_randomization.md (MR1–MR8: the three IV assumptions, a
pleiotropy-robust sensitivity suite rather than IVW alone, Steiger/direction, sample overlap,
non-linear-MR caution, drug-target colocalization); polygenic_risk_score.md (PG1–PG8:
ancestry transferability/portability, base/target leakage, incremental value over the clinical
model, screening detection-rate-vs-discrimination, target-population calibration);
network_meta_analysis.md (NM1–NM8: transitivity, global+local incoherence, SUCRA/P-score
over-interpretation, CINeMA/GRADE-NMA certainty, component-NMA additivity). Domain-probe modules
12 → 15.
Observational probe O17 (observational_confounding.md) — agnostic many-exposure-scan
multiplicity (ExWAS / EWAS / MWAS): correction matched to claim against the honest test-count
denominator, independent replication as the real safeguard, correlated-exposure conservatism,
selective top-hit reporting.
Two reporting-guideline checklists (/check-reporting): STROBE-MR (Mendelian
randomization) and PGS-RS / PRS-RS (polygenic-score risk prediction), with study-type
routing + aliases. Reporting guidelines 36 → 38.
Four /analyze-stats analysis guides: multiple-testing/high-dimensional screening,
Mendelian randomization, polygenic risk score, and network meta-analysis.
/clean-data implausible-value & cross-field validity rules reference — organ-system
compatible-with-life bounds + cross-field logical-consistency rules (temporal ordering,
derived-vs-source, sex-/state-specific), flag-not-auto-fix.

Changed

Clinician-friendly update reminders. The classroom installers
(install-macos.command / install-windows.cmd / install-windows.ps1) now enable the in-app
"update available" notice and the one-click Desktop updater by default (turnkey path; disable
with --disable-update-notify or MEDSCI_NO_UPDATE_CHECK=1). For the npx/manual paths the
installer prints a one-time nudge showing how to turn reminders on (--enable-update-notify),
and the README Quick Start recommends it. New read-only update.session_hook_enabled() gates the
nudge; the npx/manual paths stay opt-in (no silent SessionStart hook).

Assets 4

25 Jun 22:19

github-actions

v4.9.0

4538d0c

MedSci Skills v4.9.0

Added

Duplicate-bibliography gate — new check_reference_duplication.py
(/manage-refs, also usable from /sync-submission) reads the BUILT artifact
(.docx via stdlib zipfile, or a rendered .md/.txt) and fires
DUP_REF_HEADING / REF_NUMBER_RESTART / REF_SIGNATURE_DUP (Major) when the
reference list is duplicated. Catches the hybrid failure where a manuscript
carries both inline [@key] citations and a hand-typed ## References list and
is built with pandoc --citeproc: the build renders the hand list and a
citeproc bibliography (often after the legends), so the same reference appears
twice; check_xref does not see it. Author-anchored (first-author, year)
signature detection works on Word auto-numbered lists. Validated against a real
built docx with the duplicate (caught) and its single-list fix (clean).
Stdlib-only; PII-free fixtures + test_reference_duplication.sh.
Cross-script binning-consistency gate — new check_binning_consistency.py
(/self-review, Phase 2.5b) parses analysis source (R/Python) and emits
BINNING_DRIFT (Major) when one derived categorical (age band, BMI category,
eGFR stage, risk tier) is binned with ≥2 different (breaks, right-closure)
signatures across files. The same cohort then splits differently per script:
per-stratum Ns drift between a primary table and a sensitivity table while the
grand total still reconciles, so a row-sum check passes but a stratum can
spuriously cross a threshold. Motivated by a screening cohort that binned age
right=FALSE in the primary script vs right=TRUE in a threshold sensitivity
script — fractional ages shifted hundreds of participants and produced a
spurious "reached" stratum. Stdlib-only; PII-free fixtures +
test_binning_consistency.sh.

Together these two gates take the analysis-integrity detector suite 34 → 36
(citation family 6 → 7, data-preparation 5 → 6); skills and reporting guidelines
unchanged. Additive and backward-compatible.
Float citation-order gate — new check_citation_order.py (/self-review)
flags numbered floats not cited in ascending order of first appearance, per series
independently (main Tables, main Figures, Supplementary Tables, Supplementary
Figures). It scans only the narrative body (auto-excluding the Figure Legends /
back-matter so an in-order legends block cannot mask an out-of-order body) and
tolerates plural lists ("Tables S4, S5"), ranges, and non-float sensitivity-spec
labels ("S1–S6"). CITATION_ORDER (Major) is a pre-peer-review desk/technical-check
item editorial offices "unsubmit" for; CITATION_GAP (Minor) flags non-contiguous
numbering. Motivated by a journal technical-check unsubmit where main Table 3 was
cited before Tables 1–2 and the supplementary tables were cited wildly out of order
(S4, S9, S16, S12, …). Wired into /self-review's technical-check pass; synthetic
positive/negative fixtures + regression test. Analysis-integrity detectors
33 → 34 (Reporting compliance family 8 → 9); skills 45 and reporting guidelines
36 unchanged. Additive and backward-compatible.
Percentage-decimal style check + KJR technical-check conventions — /self-review's
check_classical_style.py gains a PERCENT_DECIMALS verdict (Minor, report-only)
flagging percentages reported to >1 decimal place ("35.14%"), which several journals
(e.g. KJR) require at one decimal at technical check; regression fixture + test added.
The KJR journal profile (write-paper detail + find-journal compact) gains a
Technical-Check Conventions section enumerating the deterministic pre-review desk
items that "unsubmit" a manuscript: ascending float citation order, demographics in
Materials and Methods, one-decimal percentages, double spacing, Acknowledgments/Funding/
Author-Contributions on the Title Page only, reporting checklist cited as "Supplementary
Material 1", IRB number in Methods even when blinded, and ICMJE forms only after
acceptance. No detector-count change (existing detector extended; profiles updated, not
added). Motivated by a 2026-06 KJR technical-check unsubmit.
Audit-dump leak gate — new check_checklist_dump_leak.py (/sync-submission)
scans every .md/.docx/.pdf in a submission directory for the residue of a
/check-reporting or /self-review internal audit report (compliance_pct,
fixable_by_ai, check_reporting_version, Auto-fix:, [PARTIAL→auto-fixed],
suggested_fix, Action Items, _pipeline_log, NON-AUTHORITATIVE). Any hit is
a P0 leak: these tooling tokens must never reach a reviewer. Motivated by a
near-miss where a prior project's STROBE_checklist_v4.pdf was actually the
check-reporting dump, reused by filename and compiled into the reviewer-visible
proof (exposing auto-fix notes, raw JSON, and a stale old title). Wired into
preflight_gate.py as a P0 check over the journal asset directory; writes
qc/checklist_dump_leak.json. /check-reporting reports now also open with a
NOT-FOR-SUBMISSION banner so the working audit is self-identifying.
Analysis-integrity detectors 32 → 33; skills 45 and reporting guidelines 36
unchanged. Additive and backward-compatible.
Frontmatter schema gate (Agent Skills cross-platform portability) — new
scripts/check_frontmatter_schema.py + CI step strictly yaml.safe_loads every
skills/*/SKILL.md frontmatter and enforces the published Agent Skills spec: valid
YAML, name ≤64 chars / lowercase-hyphen / no reserved claude/anthropic token,
description present / ≤1024 chars / no XML angle brackets. The repo's own generators
use a tolerant line-based reader, so a frontmatter block that is not valid YAML could
pass every prior gate yet be rejected by a strict-YAML consumer (the agentskills.io
directory validator or another agent platform). Self-test (tests/test_frontmatter_schema.sh)
covers each violation class. This is a repo-CI validator, not a counted detector.

Changed

Skill-boundary documentation — a diagnostic pass confirmed the 45 skills are
deliberately specialized (no consolidation warranted), but several boundaries were
easy to confuse. README's "Skills Work Together" now carries a Skill boundaries
block spelling out the reference pipeline (search-lit → lit-sync → manage-refs →
verify-refs), the language pass order (humanize → polish-language → academic-aio),
manuscript-type selection (write-paper / review-paper / revise), author-vs-reviewer
(self-review / peer-review), project entry (intake-project / orchestrate), study
design (design-study perceptual ceiling gate / design-ai-benchmarking), and content
vs template (write-protocol / fill-protocol). /revise now documents the manual
fallback when /analyze-stats or /make-figures is unavailable (emit a checklist, hold
responses as BLOCKED — pending analysis/figure, never invent numbers). Docs only.
/analyze-stats observational-design precondition — Phase 2 (Analysis Plan) now opens
with a WARN-level precondition: before planning an observational analysis (cohort,
case-control, cross-sectional, registry, survey), confirm a literature-grounded
variable_operationalization.md (from /define-variables) or equivalent codebook-backed
definition table exists; if not, warn and recommend /define-variables first so
exposure/outcome/covariate definitions and cutoffs are citation-backed rather than invented
ad hoc from the data dictionary. WARN, not a hard block (proceed on explicit confirmation;
stricter projects can treat it as a hard stop). Mirrors the precondition /write-protocol
already enforces before drafting Methods, closing the one observational-pipeline skill that
lacked it. Guidance only — non-breaking, no new code gate.
/meta-analysis progressive disclosure (token hygiene) — the two inline "Empirical
Lessons" sections (16 dated SR-MA peer-review lessons, ~45 lines) moved verbatim to
load-on-demand references/empirical_lessons.md, with an explicit "load before Phase 4
extraction-form design and before Phase 8 submission" pointer and a Reference Files
entry — matching the skill's own established pattern (15 existing reference files). The
largest SKILL.md in the bundle drops 804 → 775 lines (less context loaded on every
activation); the lessons stay discoverable via the reference list. Content byte-preserved
(no rewrite, no renumber — a pre-existing duplicate "9." label is carried over and noted in
the reference file). No skill/detector count change.
De-drift the sync-submission YAML front-matter splitter — check_wordcount_cap.py
and cover_letter_drift_check.py each carried their own _strip_yaml_front_matter, marked
"keep in sync" but already drifted (list vs tuple return; subtly different unclosed-fence
handling). Extracted one canonical split_yaml_front_matter() into a private
scripts/_yaml_frontmatter.py (leading underscore → not counted as a detector) imported by
both — the helper ships in the same skill's scripts/ dir, so it stays self-contained when
vendored/installed. Behavior-preserving (verified normal / no-front-matter / unclosed cases
- the wired test_wordcount_cap and test_preflight_gate subprocess-import path). No
  skill/detector count change.

Fixed

Public-doc count reconciliation — README.md (MedSci-Audit suite line) and
CITATION.cff (abstract) cited stale catalog totals from before the detectors above
merged (28 detectors / 32 EQUATOR guidelines). Reconciled to the disk SSOT
(metadata/catalog_counts.json): 36 analysis-integrity detectors / 36 reporting
guidelines. Added a What's New "Unreleased" block to README.md so the public
progression no longer implies v4.8 is current. No code or count change — the SSOT was
already correct; o...

Assets 4

24 Jun 12:01

github-actions

v4.8.0

2e7ad1a

MedSci Skills v4.8.0

The review-harvest batch: deterministic detector hardening promoted from real-manuscript review
cycles — four false-positive fixes, two new gates, nine reviewer-side domain probes, and a
design-stage gate. Additive and backward-compatible — no skill, CLI, or output-path change;
skills 45 and reporting guidelines 36 unchanged; analysis-integrity detectors 30 → 32.

Added

Reader-facing supplement / multi-file hygiene gate — new check_supplement_hygiene.py
(/self-review) lints the rendered supplement, a separately-built tables file, and caption files
(not just manuscript.md) for the technical-check-fatal residue that hides there: §/§L internal
labels, unfilled placeholders (Table SX, [Authors], figure-path globs, build-dir paths), build
markers ([VERIFY]/TODO), response-to-reviewers framing, planning residue, and body↔supplement
cross-reference numbers that don't resolve. check_artifact_coverage.py gains
PROMISED_STAT_NO_VALUE + a --supplement corpus (a bound/ceiling/de-confounded statistic promised
but never given a number anywhere). (#187)
Power-aware null-interpretation gate — new check_null_calibration.py (/self-review)
flags a headline negative/equivalence claim ("no synergy", "not associated") that carries no
minimum-detectable-effect, power, equivalence-margin/TOST, or CI-compatibility statement. Plus a
reusable rating_monotonicity.py template (/analyze-stats) that catches a folded
confidence-weighted (call × confidence) → AUC encoding, and a /design-study design-stage ceiling
gate for perceptual/reader-AI studies (6 ceiling-breakers set before data lock). (#188)
Nine reviewer-side domain probes across the shared peer-review/self-review modules: SR/MA
small-k enrollment-overlap, mixed-denominator pooling, prospective-registration chronology, and
boundary-degenerate proportions (P14–P17); observational selection-on-availability and
serial-imaging lesion-tracking (O15/O16); diagnostic exclusion-flow ↔ prose + modality-safety (D8);
AI arm-task-vs-deployment-workflow (AO6); and a survival apparent-vs-optimism deterministic tell
(S7). (#186)
Integrity detector count: 30 → 32.

Fixed

Four detector false positives that fired Major on legitimate (often recommended) patterns:
check_generated_code no longer flags a hex-color palette (the colorblind-safe WONG palette
make-figures recommends) as hand-typed tabular data; check_classical_style fires the § AI-tell
only on a section cross-reference, not on author-footnote daggers; check_scope_coherence clears
CROSS_SECTIONAL_PROGNOSTIC when the prognostic token sits inside a negation/deferral frame; and
check_cohort_arithmetic no longer mis-binds the RATE_BACKCALC numerator to a tier label's digit
or a decimal's fraction. Each ships a regression fixture; three previously-unwired test suites are
now CI-wired. (#185)

Changed

Release pipeline now also publishes to npm (idempotent, with npm provenance via OIDC), so the
npx medsci-skills@latest install channel no longer drifts behind the GitHub release. The step runs
only when the NPM_TOKEN repo secret is set, skips if that version is already on npm (re-running a
tag is safe), and runs after the GitHub Release so an npm hiccup never blocks it. No product change.

Assets 4

22 Jun 09:17

github-actions

v4.7.0

7ea0b91

MedSci Skills v4.7.0

The self-update foundation: physician-researchers stay current without GitHub, git, or a
terminal — via a transactional crash-safe installer, a verified one-click updater, a hardened
release pipeline, and an opt-in update notice. Additive and backward-compatible — no skill, CLI,
or output-path change; skills 45 and reporting guidelines 36 unchanged. All four pieces are
network-mocked-tested and run on Ubuntu + macOS + Windows CI.

Added

Transactional, crash-recoverable installer + per-target state. install.py now installs each
target through a durable journal state machine (installers/medsci_txn.py,
prepared → old_moved → new_installed → committed, atomic-write + fsync): an interrupted install
is recovered on the next run (roll back an incomplete transaction, forward-clean a committed one,
fail closed on a corrupt journal). It keeps a per-target installed manifest at
~/.medsci-skills/targets/<target>/ with a per-skill SHA-256 inventory — a skill you modified
is snapshotted to ~/.medsci-skills/backups/<ts>/ before an update, legacy collisions are backed up
there (never inside the skills dirs, never auto-deleted), and only MedSci-owned skills are pruned
(your/third-party skills are untouched). Adds canonical-home containment path-safety, a
disk-space preflight, two deterministic tracked manifests
(metadata/distribution_manifest.json ownership/version + metadata/distribution_files.json
payload inventory) with a CI --check gate, and a Windows/macOS CI matrix. (#177)
One-click self-updater (installers/update.py). Fetches the latest classroom release and
re-installs through the transactional installer — no GitHub UI, git, or terminal. Resolves the
release via api.github.com only and fails closed if the API has no sha256 digest; verifies
the download's sha256 == the API digest, the asset name, and the tag; and never extractall()s —
it extracts per entry, rejecting path traversal (POSIX + Windows), symlink/hardlink/junction,
case-insensitive duplicates, and zip-bombs, and enforcing the distribution_files.json allowlist +
per-file hash. Installs the updater to ~/.medsci-skills/updater/ (survives deleting the download
folder); install.py --check-update reports availability via semver with a clock-sane 24h cache;
optional consented --desktop-launcher. Thin .command/.cmd launchers wrap it; a privacy notice
(docs/update_privacy.md) states the honest scope. (#178)
Release-pipeline supply-chain hardening. release.yml now gates on a version-consistency check
(the pushed tag must equal CITATION.cff == package.json == metadata/distribution_manifest.json
and the tracked inventory must match the tree); injects a verified provenance.json
{schema_version, tag, version, git_sha, built_at} into each classroom ZIP via
build_classroom_release.py --tag/--git-sha/--built-at; attests the ZIPs' build provenance
(actions/attest-build-provenance); runs on a protected release environment (required-reviewer
approval); and — via the new scripts/check_release_zip.py — verifies each ZIP round-trips through
the updater's own safe-extract + provenance validation before publishing, so a release can never
ship a ZIP the self-updater would reject (locked by installers/tests/test_release_zip.sh).
provenance.json stays a control file (excluded from the safe-extract inventory). SECURITY.md
gains a "Release integrity & revocation" section; docs/maintainer_workflow.md documents the
protected-environment setup. (#179)
Opt-in update notice for Claude Code (off by default). install.py --enable-update-notify
merges a SessionStart hook (installers/session_update_check.py) into ~/.claude/settings.json
that prints a one-line "update available" systemMessage at session start; --disable-update-notify
removes only that hook (keying on the home-anchored script path, so it never touches a foreign hook).
The hook does not read the SessionStart stdin (no cwd/transcript/session id), has no
telemetry/analytics/unique-id, uses the shared clock-sane 24h cache + a 4 s timeout, stays silent on
any error (never blocks a session), honors MEDSCI_NO_UPDATE_CHECK=1, and installs nothing — it
only notifies. A version check resolves the latest tag without the OS-specific download asset
(resolve_latest_tag), so the notice works on Linux too; the settings merge is idempotent, preserves
foreign hooks/settings, and refuses to clobber an unparseable settings.json. Tested offline
(installers/tests/test_session_hook.py, 38 cases). (#180)

Trust boundary (honest scope)

Running a release's bundled installer is remote code execution within the GitHub trust boundary.
The digest and the build-provenance attestation detect transport / asset tampering — they do
not defend against a compromised publisher account or a malicious official release. See
SECURITY.md and docs/update_privacy.md.

Assets 4

21 Jun 08:53

github-actions

v4.6.0

35c4198

MedSci Skills v4.6.0

A maintainability, governance, and review-depth release. Integrity detectors 28 → 30; domain probes 11 → 12; skills 45 and reporting guidelines 36 unchanged. No skill rename, CLI, or output-path change — additive and backward-compatible.

Added

Fairness / equity / subgroup-performance domain probe (equity_fairness.md, EQ0–EQ6). Vendored byte-identical into /peer-review and /self-review (MODULES 11 → 12). Fires only when a manuscript claims cross-population performance or presents subgroup analyses as a fairness argument: disaggregated subgroup metrics (not aggregate-only), error-rate-vs-discrimination parity and base-rate dependence, a named fairness estimand + between-group gap test, development-cohort representativeness, subgroup EPV/power, and equity-aware framing aligned to TRIPOD+AI / DECIDE-AI / CONSORT-AI. (#170)
AI-disclosure + data/code-availability detector (sync-submission/check_disclosure_availability.py). An AI-use disclosure must carry four tokens — version + access channel + date/date-range + responsible party (the tool name only triggers the check) — plus Data/Code Availability presence with a repository/DOI where the journal expects one, keyed by journal_availability_policy.json. (#171)
Structured-summary-box conformance detector (academic-aio/check_summary_box.py). Key Points bullet count + one-claim-per-bullet, Research-in-context's three sub-blocks, and plain-language word band, journal-keyed via summary_box_specs.json — catches the wrong-format box a production technical check rejects. (#171)
Skill maturity taxonomy (official / experimental / community). A required, additive skill.yml v2.2 field (schema_version stays 2), enforced by validate_skill_contracts.py and surfaced in skills_catalog.json; all 45 current skills are official. (#174)
Governance & answer-engine docs: ROADMAP.md (priorities + explicit out-of-scope), MAINTAINERS.md (clinical authority stays with the founder), SECURITY.md (vulnerability reporting + medical-scope boundary), docs/maintainer_workflow.md (review + release checklist), docs/faq.md (AEO/GEO), and two new issue templates (installation problem, detector request). (#173)

Changed

Positioning leads with the compliance moat. README hero subline and the marketplace source description (MARKETPLACE_DESCRIPTION) now lead with reporting-guideline + risk-of-bias compliance, reference verification, and deterministic integrity gates rather than skill count. README gains a "What is MedSci Skills?" answer block, a "Start here: 3 workflows" section, and a "Validation status" section (available vs CI-gated vs E1-evaluated). A stale "32 EQUATOR" hero count was corrected to "36 reporting guidelines and risk-of-bias tools". (#173, #174)
write-paper Phase 7 token diet (pilot). The three integrity-audit sub-steps (7.3a/7.3b/7.3c) moved to references/phase7_integrity_audits.md behind a control-flow-preserving pointer; measured −10,238 chars (~2,559 tokens) per invocation, loaded on demand only when Phase 7 runs. (#172)

Documentation

CONTRIBUTING.md and the PR template add a medical-claim → founder-review gate and an official/experimental/community classification line; IMPACT.md adds an "Interpretation of metrics" caveat block ("early community interest, not widespread adoption"). (#173)

Validation / Evidence

New deterministic scripts each ship a network-free challenge/regression test wired into CI. MEDSCI_AUDIT.md detector-count claims corrected (it had drifted to 27/28) and a DETECTOR_CLAIM_FILES gate added to validate_catalog_consistency.py (anchored current-total patterns, never historical evaluation numbers) so the total cannot silently drift again. A regression test for the routing-asset gate (tests/test_routing_assets.sh) covers the references/ pointer that guards the Phase-7 extraction. (#169, #171)

Assets 4

20 Jun 12:10

github-actions

v4.5.0

4c4c966

MedSci Skills v4.5.0

Added

Self-review domain-probe batch (SR/MA + DTA + prediction-model) + submission asset-anon abs-path gate. Five new review probes promoted from field cycles, plus one deterministic submission check. sr_ma.md: P12 risk-of-bias table row-sum ↔ figure-matrix reconciliation (each NOS ★/JBI Y row must equal its printed total; the traffic-light figure's data matrix must match the supplementary table; SSOT = the primary appraisal form, not a plotting-script constant) and P13 included-study ↔ reference-list completeness (every characteristics-table study must be a numbered reference; source citations from PubMed efetch, not hand-kept notes; disambiguate same author/year by technique + sample size). diagnostic_accuracy.md: D7 index-test-as-enrollment-criterion circularity (escalate past Major when an inclusion threshold is the index test under study). clinical_prediction_model.md: CP5 intended-use horizon leakage (claim-timepoint adjectives vs each predictor's availability timepoint) and CP6 validation-nomenclature conflation (development/CV vs held-out/external test). Probes are vendored byte-identical to peer-review. sync-submission/scripts/check_asset_anonymization.py: new scan class 4 — a word/*.xml attribute (e.g. a pandoc-embedded image's <pic:cNvPr descr="…">) carrying an absolute home-dir path (/Users/…, /home/…) is a username leak invisible to a rendered-text scan; flagged as docx_embedded_abs_path (leak severity), with a regression test fixture. No version bump — probe/reference + detector additions.
/clean-data + /analyze-stats — reverse-coded-item / negative-alpha detector (integrity detectors 27 → 28). A multi-item Likert scale with a negatively-worded item must recode it (min+max) - x before the scale total or Cronbach's alpha is computed; left un-recoded, the item correlates negatively with the rest of the scale and alpha collapses (often negative). A negative alpha is a coding bug, not a "multidimensional construct" — defending it as such loses a review round. New stdlib-only skills/clean-data/scripts/check_reverse_coding.py computes per-item corrected item-total (item-rest) correlations + the raw Cronbach's alpha and returns REVERSE_CODING_LIKELY (alpha < 0) / REVERSE_CODING_SUSPECT (negative item-rest, alpha ≥ 0) / OK, exit 1 under --strict. skills/analyze-stats/references/templates/likert_summary.py is hardened to print item-rest correlations, flag negative ones as reverse-code suspects, warn loudly on a negative alpha, and apply the recode via a new --reverse-items flag before scoring/alpha. Ships a synthetic fixture (a 3-item scale with one reverse item → raw α = −1.71, plus a clean aligned scale) + CI-wired regression test (skills/clean-data/tests/test_reverse_coding.sh). Detector mapped to the data_preparation family; metadata/detectors_catalog.json regenerated; catalog_counts.json::integrity_detectors 27 → 28. Motivation: a medical-education pilot whose Trust scale shipped at α = −0.57 (one reverse item un-recoded) and consumed a major-revision round before 6 - x restored α = 0.58.
Test backfill (cont.) — fill-protocol + fulltext-retrieval regression tests (Tier 1 complete). skills/fill-protocol/tests/test_fill_form.sh builds a synthetic Word template at runtime (python-docx: 2-column key/value table + numbered section headers + title paragraph), runs fill_form.py with a content YAML exercising table_kv/section_replace/paragraph_replace, and asserts the values landed in the reopened docx, the title placeholder was replaced, and an absent label is reported [MISS] — no committed binary fixture. skills/fulltext-retrieval/tests/test_pdf_to_md.py stubs pymupdf4llm before import (the module exits on a missing dep) and pins the dependency-free helpers parse_page_range (ranges/lists/whitespace) and clean_markdown (collapse 4+ newlines, rstrip lines, single trailing newline, idempotent) — no heavy PyMuPDF dependency added to CI. Both use deps already present (python-docx/pyyaml; stdlib). No skill/version change — test infrastructure only.
Test backfill (cont.) — fill-icmje-coi + academic-aio regression tests. Three more deterministic, network-free tests wired into CI. skills/fill-icmje-coi/tests/test_fill_icmje_coi.sh clones the shipped synthetic seed for two authors and asserts the documented contract per output docx (14 checked boxes, 13 "None" disclosures, new title/date substituted, author name present, zero placeholder leakage; stdlib zipfile path). skills/academic-aio/tests/test_validate_schema.sh checks the JSON-LD validator (valid ScholarlyArticle passes; wrong @context, unknown @type, missing required field, malformed DOI each fail). skills/academic-aio/tests/test_batch_metadata_audit.sh checks the repo/HF-card auditor (clean repo passes --fail-on-issue; missing README/CITATION/LICENSE fails; report-only mode stays exit 0; a PHI-shaped string in an HF card is flagged). All fixtures synthetic. No skill/version change — test infrastructure only.
Test backfill — Tier 0 CI-wiring + deidentify PHI-scan regression test. Ten skill regression tests that existed on disk but were never gated are now wired into .github/workflows/validate.yml, so a silent break fails CI: make-figures (legend reconcile), clean-data (structural-zero), lit-sync (poll logic), meta-analysis (pool consistency), generate-codebook, present-paper (speaker-notes markdown), version-dataset (manifest/verify), manage-refs (vN-docx cross-ref), and polish-language (consistency-linter challenge). New skills/deidentify/tests/test_deidentify_scan.sh asserts the exact PHI-classification contract (PHI/REVIEW_NEEDED/SAFE counts + rrn phi_type) on the three committed fixtures — the CSV scan path is stdlib-only and network-free, and the test file is Hangul-free (column-specific asserts read the fixture header at runtime). CI now installs pandas/numpy/python-pptx/python-docx up front (was: pandas installed after the gates, which would silently skip the dep-guarded tests); version-dataset gains a pandas skip-guard for local robustness. No skill/version change — test infrastructure only.

Assets 4

19 Jun 15:57

github-actions

v4.4.0

e004240

MedSci Skills v4.4.0

Added

/peer-review + /self-review — Image-Synthesis / Cross-Modality Generation probe module (IS1–IS4) + reviewer-side reference-integrity spot-check. New domain-probe module image_synthesis.md (vendored byte-identical into /self-review; MODULES 10 → 11, sync gate updated) for studies that synthesize one imaging modality from another (MRI→PET / MRI→CT / non-contrast→contrast / low-dose→full-dose) and claim the output carries functional/molecular information or substitutes for the unavailable target. IS1 determinism/information-ceiling (the synthetic image is a deterministic function of the source, so a same-reader "source + synthetic > source alone" gain is a presentation/interpretability effect absent a direct source→label baseline); IS2 target-derived-preprocessing / undescribed slice-selection leakage (a lesion mask drawn on the target modality guiding slice selection or training makes "function inferred from structure" circular — undescribed provenance is itself a Major #1 candidate); IS3 global-vs-lesion-level quantitative agreement (whole-organ SUVR agreement does not establish lesion-level fidelity); IS4 mechanistic/proxy-signal plausibility (name what the source physically measures vs the target's biology — high image similarity is not evidence an unmeasured signal was recovered). Routed from a new peer-review Phase 2K + Phase 3 QC item 15 + Phase 5 routing line, and a /self-review routing-table row. Per Phase 2F, IS2/IS4 are typically unfixable-in-current-form and govern the recommendation toward Reject-leaning. Companion reviewer-side reference-integrity spot-check added to the Phase 2 issue checklist + Phase 3 QC item 16 (all original-research reviews): spot-check the load-bearing Introduction/Discussion citations used as evidence the method/premise works — a paper cited for a different task, a duplicate reference, a wrong year/author — phrasing unconfirmed suspicions "please verify" (the reviewer-side mirror of the authoring citation-safety discipline). Motivation: a decision-audit of a cross-modality MRI→synthetic-PET reader-study review where the three structurally distinct synthesis failure modes were split across reviewers and the reference-list errors went uncaught on the reviewer side.
/author-strategy — trajectory-archetype classification (optional, explainable multi-label heuristic). Adds an opt-in capability that classifies a queried author's PubMed trajectory into abstract career archetypes (A1 infrastructure builder, A2 methodology rule-maker, A3 clinical→AI hybrid, A4 SR/MA volume engine, A5 large-consortium participation pattern, A6 clinical-subspecialty device/technique depth, plus a computed A3+A6 composite). The rubric is a single canonical data file (references/trajectory_archetypes.yaml); the narrative references/trajectory_archetypes.md is generated from it by render_archetype_doc.py (--check gate). Each label carries a 0–1 score (computable-signal-weight denominator; unavailable signals — h-index/citation/venue-tier — are excluded and surfaced as [VERIFY], never fabricated), a confidence band capped per archetype, and evidence drawn from the author's own PMIDs (evidence_pmids for per-paper signals, evidence_summary for corpus-level); a negative rule suppresses a label to insufficient evidence. A disambiguation gate precedes classification: fetch_pubmed.py writes a corpus_manifest.json cryptographically bound to the CSV (csv_sha256 + pmid_set_hash) and classify_archetypes.py refuses to run unless review_status: approved and the hashes match — a surname alone never resolves an author, and --approve is a human gate. Target-author attribution (ORCID/affiliation/initials/position) is split into a stdlib-only pubmed_parse.py and never borrows a co-author's metadata on a same-surname collision; author position is reported as a first/middle/last/unknown positional heuristic (not leadership metadata), and analyze_patterns.py's "Leadership rate" is renamed "First/last positional rate". The output header states the labels are explainable heuristics, not objective classifications. Ships name-free synthetic fixtures + a CI-gated regression test (A14). Skill count unchanged — an enhancement, not a new skill.
/verify-refs — OpenAlex tertiary index (conference-proceedings / non-DOI recovery). PubMed covers only biomedical literature and CrossRef's proceedings coverage is uneven, so NeurIPS / ICLR / ACL-style citations — common in medical-AI manuscripts — fall through both and were marked UNVERIFIED. After the PubMed and CrossRef tiers, verify_refs.py now consults OpenAlex (https://api.openalex.org, free, no API key) only when no authoritative author list was obtained yet (a reference already resolved by PubMed/CrossRef incurs no extra call). It resolves by DOI when present, otherwise by a token-similarity-guarded title search so a fabricated title cannot earn a spurious OK. This is the free analogue of the second index (e.g. Scopus) that journal portals run alongside CrossRef. Because OpenAlex display names carry no structured family/given field and mix First Last with Last, First forms, OpenAlex-sourced authors support an existence check plus a tolerant first-author membership check but never drive the strict positional or author-count MISMATCH (reserved for PubMed efetch / CrossRef); an OpenAlex miss is UNVERIFIED, never FABRICATED. New --no-openalex flag restricts verification to PubMed + CrossRef. Ships a network-free regression test (tests/test_openalex_tier.sh, monkeypatched http_json, CI gate A8b). Motivation: a medical-AI reference list where two NeurIPS citations validated on Scopus but not CrossRef in a journal portal's reference check.

Assets 4

15 Jun 20:36

github-actions

v4.3.0

b5009d4

MedSci Skills v4.3.0

Added

Observational / cohort probe + gate hardening (sourced from two cross-sectional health-screening cohort self-review→revise loops). Expands observational_confounding.md O1–O6 → O1–O9 (vendored byte-identical into /self-review): O7 — over-adjustment (conditioning on a mediator or consequence of the outcome — the opposite-direction failure to O1, e.g. a renally-excreted lab in an eGFR model; "adjust for everything that differs in Table 1" is not a confounder-selection rule), O8 — analysis unit & clustering (records vs unique subjects → anti-conservative CIs), O9 — outcome construct validity for report-/registry-derived outcomes (composite homogeneity, ascertainment/κ, dictionary-first label provenance, misclassification direction). O1 also gains an exposure-defining-covariate exemption for guideline-defined exposures and a reference-arm-contamination-vs-selection-bias note (O3); check_confounding_completeness.py now computes SMD from per-stratum mean ± SD when the wide Table 1 carries no p / SMD column (interop with /analyze-stats).
New domain-probe module clinical_prediction_model.md (CP1–CP4) for cross-sectional / observational prediction models (TRIPOD / TRIPOD+AI nested predictor-set comparisons): apparent-vs-optimism-corrected calibration/DCA, the incremental-value-vs-marginal-effect two-null distinction, EPV per nested model, and net benefit as a model comparison (not a policy endorsement). Vendored byte-identical into /self-review; MODULES 9 → 10; routed from peer-review (new Phase 2E-2) and self-review. Plus two /self-review exemplar_findings/ (over_adjustment_collider.md, prediction_two_null_conflation.md).
Cohort-analysis probes (G39–G41). survival_prognostic.md gains S9 — panel-data / multistate variance (occupancy/intensity CIs must be person-clustered or person-bootstrapped, not naive model-based on within-person-correlated visit transitions; S1–S8 → S1–S9). observational_confounding.md gains O10 — overlapping-subset gradient (an effect-size gradient across nested/overlapping cohorts is attributable by construction; inferential "attenuated/accounted-for" language needs a difference/interaction test; O1–O9 → O1–O10). Both vendored byte-identical into /self-review. Plus an extended-adjustment missingness-frame discipline (compare adjusted vs unadjusted on the same reduced complete-case frame, not the full-frame anchor) in /self-review Phase 2.5e + /analyze-stats over-adjustment guidance.
Cross-sectional survey-epidemiology probes (G45–G46, paper-driven from CC-BY NHANES cohorts). observational_confounding.md gains O11 — complex-survey design & weighting (NHANES/KNHANES/CHNS: design-based estimation with the correct/scaled weight + stratification + PSU, subpopulation-domain-not-row-deletion, weighted total is a population estimate not a sample n, design-effect/effective-n) and O12 — data-driven threshold / non-linearity mining (a recursive-search 'inflection point' / 'saturation effect' needs a breakpoint CI + pre-specified non-linearity test + stability check, not a quoted cutoff). O1–O10 → O1–O12, vendored byte-identical into /self-review. /analyze-stats survey_weighted.md gains a subpopulation-domain (never row-delete) + survey-reporting-errors block.
Cross-sectional mediation probe (G47, paper-driven from CC-BY mediation papers). observational_confounding.md gains O13 — cross-sectional mediation (temporal order & sequential ignorability): a Baron–Kenny / Sobel / PROCESS / bootstrapped indirect-effect chain estimated on single-timepoint data cannot establish the X→M→Y sequence (the bootstrap CI addresses sampling variability, not identification); needs an unmeasured-mediator–outcome-confounding sensitivity analysis (e.g. an E-value for the indirect effect) + a temporal-order caveat, and proportion-mediated is unstable when the total effect is small. O1–O12 → O1–O13, vendored byte-identical into /self-review; adds exemplar_findings/cross_sectional_mediation.md.
Cleanup batch (G48/G42/G43). /analyze-stats gains a mediation analysis guide (analysis_guides/mediation.md + SKILL entry): bootstrapped a×b indirect effect, proportion-mediated only with uncertainty, AGReMA reporting, and the discipline that identification (no unmeasured mediator–outcome confounding → E-value for the indirect effect) — not the bootstrap — is the issue (pairs O13). /sync-submission gains scripts/assemble_supplement.py (NOT an integrity detector): validates an S{N}_*.md + index supplement (index↔file 1:1, duplicate/skipped sub-section numbers), rebuilds _combined.md in index order, and reports main-text callout coverage. /render-pdf-doc gains scripts/scan_glyph_coverage.py + a Step 3.5 pre-render scan for the xelatex silent-glyph-drop failure (arrows / − ≤ ≥ ± √ / Greek / ★ ✓ / CJK; optional fonttools cmap check). Both ship fixtures + CI-wired tests (A12/A13). Integrity-detector count unchanged (27).
Interaction-scale probe (G49, paper-driven from CC-BY joint-effect papers). observational_confounding.md gains O14 — interaction scale (additive vs multiplicative): a synergy / joint-effect / effect-modification claim is an additive-scale statement and needs RERI / AP / synergy index with CIs, not a multiplicative-only OR product term, joint-category ORs, or stratified-only estimates (the difference-in-significance fallacy). O1–O13 → O1–O14, vendored byte-identical into /self-review; /analyze-stats gains an Interaction & Effect-Modification entry (RERI/AP/S, Knol & VanderWeele). The cross-sectional-cohort review lane (O1–O14 + CP1–CP4 + S9 + gates) is now comprehensive.
check_cohort_arithmetic.py — new ANALYSIS_UNIT_UNDISCLOSED check (--id-col, auto-detect with a cardinality guard): when records > unique subjects and the manuscript discloses neither the analysis unit nor a one-record-per-subject sensitivity, emits a Major with a records / unique_subjects / repeat_subjects / max_visits reconciliation (probe O8).
check_scope_coherence.py — new CROSS_SECTIONAL_YIELD_LANGUAGE lexicon (Minor): a cross-sectional / prevalence design using incidence-flavored vocabulary ("yield", "detection rate", "number-needed-to-screen/image", "rescreen interval") without defining "yield" once as cross-sectional report-positive prevalence.
New detector check_paren_spans.py (/self-review, integrity detectors 26 → 27, family Style & review-process) — a post em-dash→paren-conversion safety scan (cohort-cycle follow-up): a bulk — X — → (X) edit can pair two unrelated dashes across a sentence boundary and wrap a whole sentence — or an ordinal limitation ("Sixth, …") — inside one parenthesis, paren-balanced so a balance check misses it. Flags PAREN_SPAN_ORDINAL and PAREN_SPAN_SENTENCE (long spans only, so short legitimate parentheticals like "(Dr. Smith)", "(Fig. 2)", "(95% CI …)" are clean). Wired into /self-review --fix post-edit and /humanize pattern 13. Fixtures + regression test (CI-gated).
New detector check_wordcount_cap.py (/sync-submission, integrity detectors 25 → 26, family Reporting compliance) — the revision-inflation trap: a revise loop monotonically adds words and silently breaches the target journal's body cap. Counts the body (Introduction → Discussion, skipping abstract/refs/tables/declarations), compares to a cap from --limit or a parsed --journal-profile article-type line, and emits WORDCOUNT_OVER_CAP (Major) / WORDCOUNT_NEAR_CAP (Minor, >0.95×). The binding number is the rendered count (citeproc expands [@key]), so it prefers --rendered-words N and otherwise estimates from the markdown body + inline-citation expansion. Wired as /sync-submission Gate 13, a /revise exit gate (re-run after every pass), and a /self-review §F check. Ships fixtures + regression test.

Fixed

verify_refs.py — corporate/collective-author render-abort fix (cohort-cycle follow-up). A guideline body double-braced in BibTeX ({{EASL} and {EASD}}, {{KDIGO CKD Work Group}}) or returned by PubMed as <CollectiveName> tripped the first-author cross-check as MISMATCH, which aborted render_pandoc.sh on every guideline-citing cohort manuscript. Corporate authors are now detected (surviving brace / <CollectiveName> / organization keyword) and exempted from the personal-name family cross-check (annotated corporate/collective author, never MISMATCH). Personal-author entries are unaffected.
check_classical_style.py — em-dash counter counts prose only (cohort-cycle follow-up). It excludes structural dashes — markdown table cells (incl. "—" N/A placeholders and (A) — panel-label captions), ORCID separators, and author/affiliation lines — and reports prose-vs-structural separately, so a cohort manuscript with large baseline tables is not pushed into destructive edits on correct table dashes.
check_confounding_completeness.py — DB-column-code ↔ prose alias map. A DB-exported Table 1 carrying column codes (he_sbp, b_uric, b_chol_hdl) was false-flagged as imbalanced-and-unadjusted when the adjustment set was written in prose ("systolic blood pressure"). An alias map now resolves both to a shared concept; it only ever adds matches (no new false ✓). Genuinely unadjusted covariates still flag.
check_confounding_completeness.py — exposure-defining-covariate exemption (O1 false-positive on guideline-defined exposures). For a guideline-defined exposure (MASLD / metabolic syndrome / CKM / sarcopenia / frailty), the components of its own diagnostic criteria (BMI, glycaemia, lipids, BP) are imbalanced by construction and correctly unadjusted — the gate flagged each as a Major. New --exposure-defining-list/-file marks these EXPOSURE_DEFINING_EXEMPT (adjusting for them is over-adjustment, probe O7), so the Major remains only for genuine non-d...

Assets 4

Uh oh!

Releases: Aperivue/medsci-skills

MedSci Skills v5.0.0

Changed

Added

Uh oh!

MedSci Skills v4.11.0

Added

Uh oh!

MedSci Skills v4.10.0

Added

Changed

Uh oh!

MedSci Skills v4.9.0

Added

Changed

Fixed

Uh oh!

MedSci Skills v4.8.0

Added

Fixed

Changed

Uh oh!

MedSci Skills v4.7.0

Added

Trust boundary (honest scope)

Uh oh!

MedSci Skills v4.6.0

Added

Changed

Documentation

Validation / Evidence

Uh oh!

MedSci Skills v4.5.0

Added

Uh oh!

MedSci Skills v4.4.0

Added

Uh oh!

MedSci Skills v4.3.0

Added

Fixed

Uh oh!