You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reporting/documentation seam of the model-engineering lane, after validation
(Phase 1) and build (Phase 2). Clinician-anchored, additive: skills 48→49,
detectors 38→39; reporting_guidelines UNCHANGED (Model Card/Datasheet are
documentation standards vendored as uncounted templates, per the appraisal_tools
precedent — Codex policy).
- /model-card (Layer C): generate a Model Card (Mitchell 2019) + dataset Datasheet
(Gebru 2021) + a METRIC-informed data-quality pass (Schwabe 2024), filled from
user-supplied facts — never fabricated (intended use, out-of-scope, training data,
per-subgroup performance, caveats, provenance, consent, licence; unknown stays
[NEEDS INPUT]). Mirrors version-dataset structurally (generate + verify).
- check_model_card_complete.py: presence gate — every required Model Card/Datasheet
section present and non-empty (not missing, not an unfilled placeholder). Verdicts
MISSING_SECTION / EMPTY_REQUIRED_SECTION (Major); presence, not truth. Flattens
bodies + strips whole placeholder spans + bold field-labels so wrapped [NEEDS INPUT]
reads as unfilled; N/A and None count as filled answers. reporting_compliance family.
- references/ templates (uncounted): model_card_template.md, datasheet_template.md,
metric_dimensions.md. + reproducible challenge (synthetic complete + incomplete
fixtures) + CI-wired regression test (8 cases).
All CI-mirror gates green locally (validate_skills, all gen_* --check,
validate_catalog_consistency, frontmatter, routing-assets, locale, version, npm, both
new CI steps). Version left at 4.10.0 — release is a separate gated step.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: MEDSCI_AUDIT.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# MedSci-Audit
2
2
3
-
**MedSci-Audit** is the named deterministic verification layer inside [MedSci Skills](README.md): a suite of **38 stdlib-only detectors** that catch fabricated, drifted, or non-compliant content in a medical manuscript *before* it reaches a reviewer. The detectors run inside the skills that own them (e.g. `/self-review`, `/check-reporting`, `/sync-submission`, `/verify-refs`); this document names and indexes that suite so it can be cited and reasoned about as one thing.
3
+
**MedSci-Audit** is the named deterministic verification layer inside [MedSci Skills](README.md): a suite of **39 stdlib-only detectors** that catch fabricated, drifted, or non-compliant content in a medical manuscript *before* it reaches a reviewer. The detectors run inside the skills that own them (e.g. `/self-review`, `/check-reporting`, `/sync-submission`, `/verify-refs`); this document names and indexes that suite so it can be cited and reasoned about as one thing.
4
4
5
5
The detectors are **deterministic** — same input, same verdict, no LLM in the decision path — so a flagged defect is reproducible and a clean run is meaningful.
6
6
@@ -15,24 +15,24 @@ MedSci-Audit detectors **find** integrity problems; they deliberately do **not**
15
15
16
16
The authoritative, machine-readable list is **[`metadata/detectors_catalog.json`](metadata/detectors_catalog.json)** — generated from the detectors under `skills/*/scripts/` by [`scripts/gen_detectors_catalog_json.py`](scripts/gen_detectors_catalog_json.py) and CI-gated with `--check` (it uses the same discovery glob as `validate_catalog_consistency.py`, so its `detector_count` always equals `catalog_counts.json::integrity_detectors`). Do not hand-maintain a parallel list; read the JSON.
The suite's evaluation evidence and its current size are **two separate facts** — they are reported at different versions, and should not be collapsed into a single "38 detectors, validated by E1/E7" claim.
31
+
The suite's evaluation evidence and its current size are **two separate facts** — they are reported at different versions, and should not be collapsed into a single "39 detectors, validated by E1/E7" claim.
32
32
33
-
-**Current detector catalog: 38** (the enumerated list in `metadata/detectors_catalog.json`).
33
+
-**Current detector catalog: 39** (the enumerated list in `metadata/detectors_catalog.json`).
34
34
-**Canonical evaluation runs are v3.8-era and validate the then-current subset.** The seeded-defect benchmark (**E1**) is built on **19 `DefectSpec` rows / 17 deterministic injectors** ([`evaluation/h1_seeded_defects/DEFECT_RATIONALE.md`](evaluation/h1_seeded_defects/DEFECT_RATIONALE.md)), and the coverage inventory (**E7**) is **n=21** ([`evaluation/runs/canonical/E7/limitations.md`](evaluation/runs/canonical/E7/limitations.md)). Both predate the A1–A4 detectors that brought the catalog to 24. The frozen canonical runs under [`evaluation/runs/canonical/`](evaluation/runs/canonical/) are pinned to the published methods artifacts and are intentionally left unchanged.
35
-
-**Detectors added since v3.8 are covered by their own per-skill CI tests** (e.g. `skills/sync-submission/tests/test_asset_anonymization.sh`, `skills/check-reporting/tests/test_checklist_version.sh`, `skills/write-paper/tests/test_placeholders.sh`), run on every push via [`.github/workflows/validate.yml`](.github/workflows/validate.yml) — not by a re-run of the frozen E1/E7. A refresh of E1/E7 to cover all 38 detectors is a separate evaluation effort and is **not** part of this registry.
35
+
-**Detectors added since v3.8 are covered by their own per-skill CI tests** (e.g. `skills/sync-submission/tests/test_asset_anonymization.sh`, `skills/check-reporting/tests/test_checklist_version.sh`, `skills/write-paper/tests/test_placeholders.sh`), run on every push via [`.github/workflows/validate.yml`](.github/workflows/validate.yml) — not by a re-run of the frozen E1/E7. A refresh of E1/E7 to cover all 39 detectors is a separate evaluation effort and is **not** part of this registry.
36
36
37
37
For the broader evaluation harness (E1–E9: seeded-defects, LLM baseline, cost/time, fresh-clone reproducibility, audit-trail completeness, portability, inventory, drift, self-review convergence), see [`evaluation/`](evaluation/).
Copy file name to clipboardExpand all lines: README.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,14 +2,14 @@
2
2
3
3
# MedSci Skills
4
4
5
-
**48 skills that actually work.** Built by a physician-researcher, tested on real publications.
5
+
**49 skills that actually work.** Built by a physician-researcher, tested on real publications.
6
6
7
7
*MedSci Skills is a submission-grade clinical manuscript workflow, not a generic biomedical skill catalog. Its moat is the compliance layer — 38 reporting guidelines and risk-of-bias tools, reference/citation verification, and deterministic integrity gates, before peer review sees the manuscript. It competes on clinical submission reliability, not skill count.*
[](https://youtu.be/MclQ_RIofpE)
15
15
[](https://github.com/Aperivue/medsci-skills/contribute)
|**model-validation**| Design or audit the clinical-validation study for an engineer-built medical-imaging model (segmentation / classification / detection): patient-level split disjointness and the data-leakage taxonomy, tuning-on-test, internal vs genuine external validation, comparator design, single-run vs multi-seed variance, task-correct metric selection (Metrics Reloaded), test-set sizing, and CLAIM 2024 / TRIPOD+AI / STARD-AI reporting fit. Ships a deterministic split-leakage gate that proves patient disjointness by set arithmetic on the emitted split table. Integrates with MONAI / nnU-Net — does not replace them. |
455
455
|**model-scaffold**| Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model. Emits a patient-level seed-locked split as an auditable artifact, a configurable U-Net, train/evaluate scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a Methods stub with VERIFY placeholders (no fabricated numbers). Reproducibility holds by construction; ships a `check_training_hygiene` AST gate + a network-free build→validate challenge. Integrates with MONAI / nnU-Net / TorchIO — does not reimplement them. |
456
456
|**architecture-zoo**| "Which architecture for which research question" decision tool: maps task (classification / segmentation / detection / transfer), modality, data scale, and class imbalance to a paper-grounded architecture shortlist. Curates the foundational curriculum (ResNet / DenseNet / EfficientNet / ViT / Swin; U-Net / 3-D U-Net / Attention & Residual U-Net / nnU-Net / Mask R-CNN; SAM/MedSAM / TotalSegmentator / BiomedCLIP / DINO / MAE / SimCLR) — each with core idea, when-to-use, medical-imaging use, reference implementation, validation setup, and the matching model-scaffold template. Advisory; teaches archetypes, not a live SOTA leaderboard. |
457
+
|**model-card**| Generate the documentation an engineer-built medical-imaging model must carry — a Model Card (Mitchell et al. 2019), a Datasheet for its dataset (Gebru et al. 2021), and a METRIC-informed data-quality pass — filled from user-supplied facts (never fabricated), then verify every required section is present and non-empty with a deterministic completeness gate (`check_model_card_complete`). Model Card / Datasheet are documentation standards vendored as templates, not counted reporting checklists. |
457
458
|**intake-project**| Classifies new research projects, summarizes current state, identifies missing inputs, and recommends next steps. |
458
459
|**grant-builder**| Structures grant proposals: significance, innovation, approach, milestones, and consortium roles. |
459
460
|**present-paper**| Academic presentation preparation: paper analysis, supporting research, speaker scripts, slide note injection, and Q&A prep. |
Copy file name to clipboardExpand all lines: docs/skills/README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,7 @@ One reference page per skill, generated from each skill's `SKILL.md` and `skill.
33
33
-[manage-project](manage-project.md) — Research project management for medical manuscripts. _(evidence: manual_workflow)_
34
34
-[manage-refs](manage-refs.md) — Cross-cutting reference manager for medical manuscripts. _(evidence: bundled_script)_
35
35
-[meta-analysis](meta-analysis.md) — Systematic review and meta-analysis pipeline for medical research. _(evidence: demo)_
36
+
-[model-card](model-card.md) — Generate the documentation an engineer-built medical-imaging model must carry — a Model Card (Mitchell et al. _(evidence: ci_validator)_
36
37
-[model-scaffold](model-scaffold.md) — Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model. _(evidence: ci_validator)_
37
38
-[model-validation](model-validation.md) — Design or audit the clinical-validation study for an engineer-built medical-imaging model (segmentation, classification, or detection) before the validation report or manuscript is written. _(evidence: ci_validator)_
38
39
-[orchestrate](orchestrate.md) — General-purpose research orchestrator. _(evidence: demo)_
<!-- AUTO-GENERATED from skills/model-card/SKILL.md by scripts/gen_skill_docs.py. Do not edit by hand. -->
2
+
3
+
# model-card
4
+
5
+
> Generate the documentation an engineer-built medical-imaging model must carry — a Model Card (Mitchell et al. 2019), a Datasheet for its dataset (Gebru et al. 2021), and a METRIC-informed data-quality pass — filled from user-supplied facts, then verify every required section is present and non-empty before the card ships to a repo, Hugging Face card, or manuscript supplement. Never fabricates numbers, provenance, consent, or licence; unfilled fields stay flagged. Ships a deterministic completeness gate. Model Card and Datasheet are documentation standards vendored here as templates, not counted reporting checklists.
`model-card` activates on requests such as: model card, model cards, datasheet, datasheet for datasets, dataset documentation, model documentation, hugging face card, model metadata, intended use, out-of-scope, data quality, METRIC framework, model reporting, document a model.
12
+
13
+
## Quality Card
14
+
15
+
**Purpose** — Produce an auditable Model Card + Datasheet so an engineer-built model carries its intended-use, out-of-scope, training-data, per-subgroup-performance, and limitations record into clinical evaluation and publication — with a deterministic gate that no required section is missing or left as an unfilled placeholder.
16
+
17
+
**Safety boundaries**
18
+
19
+
- Templates are filled only from user-supplied facts; an empty required field stays [NEEDS INPUT] and is flagged, never auto-filled or guessed.
20
+
- Completeness is reproduced by a stdlib script; it checks presence, not the truth of a stated fact (that is model-validation / check-reporting).
21
+
22
+
**Known limitations**
23
+
24
+
- Documents what is supplied; it cannot verify that a stated performance number or provenance claim is real.
25
+
- Model Card / Datasheet are documentation standards, not clinical reporting guidelines — they are vendored as templates here, not counted reporting checklists.
*Part of [MedSci Skills](../../README.md) — Claude Code skills for the medical research lifecycle. This page is generated from the skill's `SKILL.md`; edit that file and re-run `scripts/gen_skill_docs.py`.*
"_comment": "Single source of truth for catalog counts cited in public docs (README, orchestrate, check-reporting). scripts/validate_catalog_consistency.py recomputes every value from disk, asserts this file matches, and asserts the doc claims match. Do not hand-edit a value without running that script \u2014 CI fails on drift.",
Copy file name to clipboardExpand all lines: metadata/detectors_catalog.json
+9-1Lines changed: 9 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
{
2
2
"_comment": "AUTO-GENERATED by scripts/gen_detectors_catalog_json.py from the analysis-integrity detectors under skills/*/scripts/ (same glob as validate_catalog_consistency.py). Machine-readable registry of the MedSci-Audit detector suite (single source of truth). Do not hand-edit; CI gate: python3 scripts/gen_detectors_catalog_json.py --check.",
0 commit comments