Aperivue
diff --git a/‎.claude-plugin/marketplace.json‎
Lines changed: 1 addition & 0 deletions b/‎.claude-plugin/marketplace.json‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.github/workflows/validate.yml‎
Lines changed: 6 additions & 0 deletions b/‎.github/workflows/validate.yml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 15 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎MEDSCI_AUDIT.md‎
Lines changed: 6 additions & 6 deletions b/‎MEDSCI_AUDIT.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/skills/README.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/skills/README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/skills/model-card.md‎
Lines changed: 53 additions & 0 deletions b/‎docs/skills/model-card.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎metadata/catalog_counts.json‎
Lines changed: 2 additions & 2 deletions b/‎metadata/catalog_counts.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎metadata/detectors_catalog.json‎
Lines changed: 9 additions & 1 deletion b/‎metadata/detectors_catalog.json‎
Lines changed: 9 additions & 1 deletion
@@ -33,6 +33,7 @@
         "./skills/design-ai-benchmarking",
         "./skills/design-study",
         "./skills/generate-codebook",
+        "./skills/model-card",
         "./skills/model-scaffold",
         "./skills/model-validation",
         "./skills/version-dataset"
 
@@ -245,6 +245,12 @@ jobs:
       - name: Run model-scaffold training-hygiene gate test
         run: bash skills/model-scaffold/tests/test_training_hygiene.sh
 
+      - name: Run model-card completeness challenge
+        run: bash skills/model-card/scripts/check_model_card_complete_challenge/verify.sh
+
+      - name: Run model-card completeness gate test
+        run: bash skills/model-card/tests/test_model_card_complete.sh
+
       - name: Run analyze-stats generated-code gate test
         run: bash skills/analyze-stats/tests/test_generated_code.sh
 
 
@@ -69,6 +69,21 @@
     MedSAM2 / TotalSegmentator / SegVol / BiomedCLIP / DINO / MAE / SimCLR / MoCo) families. Every
     recommendation names its source paper; it teaches archetypes, not a live SOTA leaderboard. Skills
     47 → 48.
+- **Medical-AI model-engineering lane — Phase 3 (reporting).** The documentation seam of the lane,
+  after validation (Phase 1) and build (Phase 2). Clinician-anchored, additive.
+  - **New skill `/model-card`** (Layer C) — generate the documentation an engineer-built model must
+    carry: a **Model Card** (Mitchell et al., *FAccT* 2019), a dataset **Datasheet** (Gebru et al.,
+    *CACM* 2021), and a **METRIC-informed data-quality pass** (Schwabe et al., *npj Digit Med* 2024),
+    filled from user-supplied facts — never fabricated (intended use, out-of-scope use, training data,
+    per-subgroup performance, caveats, provenance, consent, licence). Templates live in `references/`
+    and are **uncounted** (documentation standards, not clinical reporting checklists — same treatment
+    as `appraisal_tools/METRICS.md`), so `reporting_guidelines` is unchanged. Skills 48 → 49.
+  - **New deterministic detector `check_model_card_complete.py`** (`/model-card`) — verifies every
+    required Model Card / Datasheet section is **present and non-empty** (not missing, not an unfilled
+    `[NEEDS INPUT]` placeholder). Verdicts `MISSING_SECTION` / `EMPTY_REQUIRED_SECTION` (Major); a
+    presence check, not a truth check. `reporting_compliance` family. Integrity detectors 38 → 39.
+  - Reproducible challenge (`check_model_card_complete_challenge`, synthetic complete + incomplete
+    fixtures) + CI-wired regression test (8 cases).
 
 ## [4.10.0] - 2026-06-28
 
 
@@ -1,6 +1,6 @@
 # MedSci-Audit
 
-**MedSci-Audit** is the named deterministic verification layer inside [MedSci Skills](README.md): a suite of **38 stdlib-only detectors** that catch fabricated, drifted, or non-compliant content in a medical manuscript *before* it reaches a reviewer. The detectors run inside the skills that own them (e.g. `/self-review`, `/check-reporting`, `/sync-submission`, `/verify-refs`); this document names and indexes that suite so it can be cited and reasoned about as one thing.
+**MedSci-Audit** is the named deterministic verification layer inside [MedSci Skills](README.md): a suite of **39 stdlib-only detectors** that catch fabricated, drifted, or non-compliant content in a medical manuscript *before* it reaches a reviewer. The detectors run inside the skills that own them (e.g. `/self-review`, `/check-reporting`, `/sync-submission`, `/verify-refs`); this document names and indexes that suite so it can be cited and reasoned about as one thing.
 
 The detectors are **deterministic** — same input, same verdict, no LLM in the decision path — so a flagged defect is reproducible and a clean run is meaningful.
 
@@ -15,24 +15,24 @@ MedSci-Audit detectors **find** integrity problems; they deliberately do **not**
 
 The authoritative, machine-readable list is **[`metadata/detectors_catalog.json`](metadata/detectors_catalog.json)** — generated from the detectors under `skills/*/scripts/` by [`scripts/gen_detectors_catalog_json.py`](scripts/gen_detectors_catalog_json.py) and CI-gated with `--check` (it uses the same discovery glob as `validate_catalog_consistency.py`, so its `detector_count` always equals `catalog_counts.json::integrity_detectors`). Do not hand-maintain a parallel list; read the JSON.
 
-The 38 detectors fall into six audit families:
+The 39 detectors fall into six audit families:
 
 | Family | Count | Examples |
 |--------|------:|----------|
 | Numerical, cohort & pool arithmetic | 5 | `check_cohort_arithmetic`, `check_pool_consistency`, `check_artifact_coverage`, `detect_copy_divergence` |
 | Citation & reference integrity | 7 | `verify_refs`, `check_citation_keys`, `check_xref`, `check_csl_render`, `check_reference_adequacy`, `check_placeholders`, `check_reference_duplication` |
 | Style & review-process integrity | 6 | `check_classical_style`, `check_generated_code`, `check_panel_diversity`, `check_reviewer_team_consistency`, `check_paren_spans`, `check_training_hygiene` |
 | Confounding, scope & estimand contracts | 4 | `check_scope_coherence`, `check_confounding_completeness`, `check_claim_artifact`, `check_null_calibration` |
-| Reporting compliance | 9 | `check_framework_naming`, `check_checklist_exists`, `check_checklist_version`, `check_prisma_figure`, `check_wordcount_cap`, `check_disclosure_availability`, `check_summary_box`, `check_supplement_hygiene`, `check_citation_order` |
+| Reporting compliance | 10 | `check_framework_naming`, `check_checklist_exists`, `check_checklist_version`, `check_prisma_figure`, `check_wordcount_cap`, `check_disclosure_availability`, `check_summary_box`, `check_supplement_hygiene`, `check_citation_order`, `check_model_card_complete` |
 | Data preparation & validation | 7 | `check_structural_zero`, `check_reverse_coding`, `check_asset_anonymization`, `check_cross_artifact_stale`, `check_checklist_dump_leak`, `check_binning_consistency`, `check_split_leakage` |
 
 ## Evidence
 
-The suite's evaluation evidence and its current size are **two separate facts** — they are reported at different versions, and should not be collapsed into a single "38 detectors, validated by E1/E7" claim.
+The suite's evaluation evidence and its current size are **two separate facts** — they are reported at different versions, and should not be collapsed into a single "39 detectors, validated by E1/E7" claim.
 
-- **Current detector catalog: 38** (the enumerated list in `metadata/detectors_catalog.json`).
+- **Current detector catalog: 39** (the enumerated list in `metadata/detectors_catalog.json`).
 - **Canonical evaluation runs are v3.8-era and validate the then-current subset.** The seeded-defect benchmark (**E1**) is built on **19 `DefectSpec` rows / 17 deterministic injectors** ([`evaluation/h1_seeded_defects/DEFECT_RATIONALE.md`](evaluation/h1_seeded_defects/DEFECT_RATIONALE.md)), and the coverage inventory (**E7**) is **n=21** ([`evaluation/runs/canonical/E7/limitations.md`](evaluation/runs/canonical/E7/limitations.md)). Both predate the A1–A4 detectors that brought the catalog to 24. The frozen canonical runs under [`evaluation/runs/canonical/`](evaluation/runs/canonical/) are pinned to the published methods artifacts and are intentionally left unchanged.
-- **Detectors added since v3.8 are covered by their own per-skill CI tests** (e.g. `skills/sync-submission/tests/test_asset_anonymization.sh`, `skills/check-reporting/tests/test_checklist_version.sh`, `skills/write-paper/tests/test_placeholders.sh`), run on every push via [`.github/workflows/validate.yml`](.github/workflows/validate.yml) — not by a re-run of the frozen E1/E7. A refresh of E1/E7 to cover all 38 detectors is a separate evaluation effort and is **not** part of this registry.
+- **Detectors added since v3.8 are covered by their own per-skill CI tests** (e.g. `skills/sync-submission/tests/test_asset_anonymization.sh`, `skills/check-reporting/tests/test_checklist_version.sh`, `skills/write-paper/tests/test_placeholders.sh`), run on every push via [`.github/workflows/validate.yml`](.github/workflows/validate.yml) — not by a re-run of the frozen E1/E7. A refresh of E1/E7 to cover all 39 detectors is a separate evaluation effort and is **not** part of this registry.
 
 For the broader evaluation harness (E1–E9: seeded-defects, LLM baseline, cost/time, fresh-clone reproducibility, audit-trail completeness, portability, inventory, drift, self-review convergence), see [`evaluation/`](evaluation/).
 
 
@@ -2,14 +2,14 @@
 
 # MedSci Skills
 
-**48 skills that actually work.** Built by a physician-researcher, tested on real publications.
+**49 skills that actually work.** Built by a physician-researcher, tested on real publications.
 
 *MedSci Skills is a submission-grade clinical manuscript workflow, not a generic biomedical skill catalog. Its moat is the compliance layer — 38 reporting guidelines and risk-of-bias tools, reference/citation verification, and deterministic integrity gates, before peer review sees the manuscript. It competes on clinical submission reliability, not skill count.*
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 [![Release](https://img.shields.io/github/v/release/Aperivue/medsci-skills?style=flat-square&color=blue)](https://github.com/Aperivue/medsci-skills/releases/latest)
 [![CI](https://img.shields.io/github/actions/workflow/status/Aperivue/medsci-skills/validate.yml?branch=main&style=flat-square&label=CI)](https://github.com/Aperivue/medsci-skills/actions/workflows/validate.yml)
-![Skills](https://img.shields.io/badge/Skills-48-brightgreen?style=flat-square)
+![Skills](https://img.shields.io/badge/Skills-49-brightgreen?style=flat-square)
 [![npm](https://img.shields.io/npm/v/medsci-skills?style=flat-square&label=npm&color=cb3837)](https://www.npmjs.com/package/medsci-skills)
 [![Watch the 2-min intro](https://img.shields.io/badge/▶_Watch-2--min_intro-FF0000?style=flat-square&logo=youtube&logoColor=white)](https://youtu.be/MclQ_RIofpE)
 [![good first issues](https://img.shields.io/github/issues/Aperivue/medsci-skills/good%20first%20issue?style=flat-square&label=good%20first%20issues&color=7057ff)](https://github.com/Aperivue/medsci-skills/contribute)
@@ -454,6 +454,7 @@ ma-scout -> search-lit -> fulltext-retrieval -> design-study ──> write-proto
 | **model-validation** | Design or audit the clinical-validation study for an engineer-built medical-imaging model (segmentation / classification / detection): patient-level split disjointness and the data-leakage taxonomy, tuning-on-test, internal vs genuine external validation, comparator design, single-run vs multi-seed variance, task-correct metric selection (Metrics Reloaded), test-set sizing, and CLAIM 2024 / TRIPOD+AI / STARD-AI reporting fit. Ships a deterministic split-leakage gate that proves patient disjointness by set arithmetic on the emitted split table. Integrates with MONAI / nnU-Net — does not replace them. |
 | **model-scaffold** | Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model. Emits a patient-level seed-locked split as an auditable artifact, a configurable U-Net, train/evaluate scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a Methods stub with VERIFY placeholders (no fabricated numbers). Reproducibility holds by construction; ships a `check_training_hygiene` AST gate + a network-free build→validate challenge. Integrates with MONAI / nnU-Net / TorchIO — does not reimplement them. |
 | **architecture-zoo** | "Which architecture for which research question" decision tool: maps task (classification / segmentation / detection / transfer), modality, data scale, and class imbalance to a paper-grounded architecture shortlist. Curates the foundational curriculum (ResNet / DenseNet / EfficientNet / ViT / Swin; U-Net / 3-D U-Net / Attention & Residual U-Net / nnU-Net / Mask R-CNN; SAM/MedSAM / TotalSegmentator / BiomedCLIP / DINO / MAE / SimCLR) — each with core idea, when-to-use, medical-imaging use, reference implementation, validation setup, and the matching model-scaffold template. Advisory; teaches archetypes, not a live SOTA leaderboard. |
+| **model-card** | Generate the documentation an engineer-built medical-imaging model must carry — a Model Card (Mitchell et al. 2019), a Datasheet for its dataset (Gebru et al. 2021), and a METRIC-informed data-quality pass — filled from user-supplied facts (never fabricated), then verify every required section is present and non-empty with a deterministic completeness gate (`check_model_card_complete`). Model Card / Datasheet are documentation standards vendored as templates, not counted reporting checklists. |
 | **intake-project** | Classifies new research projects, summarizes current state, identifies missing inputs, and recommends next steps. |
 | **grant-builder** | Structures grant proposals: significance, innovation, approach, milestones, and consortium roles. |
 | **present-paper** | Academic presentation preparation: paper analysis, supporting research, speaker scripts, slide note injection, and Q&A prep. |
 
@@ -33,6 +33,7 @@ One reference page per skill, generated from each skill's `SKILL.md` and `skill.
 - [manage-project](manage-project.md) — Research project management for medical manuscripts. _(evidence: manual_workflow)_
 - [manage-refs](manage-refs.md) — Cross-cutting reference manager for medical manuscripts. _(evidence: bundled_script)_
 - [meta-analysis](meta-analysis.md) — Systematic review and meta-analysis pipeline for medical research. _(evidence: demo)_
+- [model-card](model-card.md) — Generate the documentation an engineer-built medical-imaging model must carry — a Model Card (Mitchell et al. _(evidence: ci_validator)_
 - [model-scaffold](model-scaffold.md) — Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model. _(evidence: ci_validator)_
 - [model-validation](model-validation.md) — Design or audit the clinical-validation study for an engineer-built medical-imaging model (segmentation, classification, or detection) before the validation report or manuscript is written. _(evidence: ci_validator)_
 - [orchestrate](orchestrate.md) — General-purpose research orchestrator. _(evidence: demo)_
 
@@ -0,0 +1,53 @@
+<!-- AUTO-GENERATED from skills/model-card/SKILL.md by scripts/gen_skill_docs.py. Do not edit by hand. -->
+
+# model-card
+
+> Generate the documentation an engineer-built medical-imaging model must carry — a Model Card (Mitchell et al. 2019), a Datasheet for its dataset (Gebru et al. 2021), and a METRIC-informed data-quality pass — filled from user-supplied facts, then verify every required section is present and non-empty before the card ships to a repo, Hugging Face card, or manuscript supplement. Never fabricates numbers, provenance, consent, or licence; unfilled fields stay flagged. Ships a deterministic completeness gate. Model Card and Datasheet are documentation standards vendored here as templates, not counted reporting checklists.
+
+**Invoke:** `/model-card` · **Tools:** Read, Write, Edit, Bash, Grep, Glob · **Model:** inherit
+
+## When to use
+
+`model-card` activates on requests such as: model card, model cards, datasheet, datasheet for datasets, dataset documentation, model documentation, hugging face card, model metadata, intended use, out-of-scope, data quality, METRIC framework, model reporting, document a model.
+
+## Quality Card
+
+**Purpose** — Produce an auditable Model Card + Datasheet so an engineer-built model carries its intended-use, out-of-scope, training-data, per-subgroup-performance, and limitations record into clinical evaluation and publication — with a deterministic gate that no required section is missing or left as an unfilled placeholder.
+
+**Safety boundaries**
+
+- Templates are filled only from user-supplied facts; an empty required field stays [NEEDS INPUT] and is flagged, never auto-filled or guessed.
+- Completeness is reproduced by a stdlib script; it checks presence, not the truth of a stated fact (that is model-validation / check-reporting).
+
+**Known limitations**
+
+- Documents what is supplied; it cannot verify that a stated performance number or provenance claim is real.
+- Model Card / Datasheet are documentation standards, not clinical reporting guidelines — they are vendored as templates here, not counted reporting checklists.
+
+**Validation**
+
+- `python3 scripts/check_model_card_complete.py --card MODEL_CARD.md --datasheet DATASHEET.md --strict`
+- `bash scripts/check_model_card_complete_challenge/verify.sh  # deterministic, network-free`
+
+**Evidence** — `ci_validator`
+
+## Bundled resources
+
+**References** (`skills/model-card/references/`):
+
+- `datasheet_template.md`
+- `metric_dimensions.md`
+- `model_card_template.md`
+
+**Scripts** (`skills/model-card/scripts/`):
+
+- `check_model_card_complete.py`
+- `check_model_card_complete_challenge/` (5 files)
+
+## Source
+
+Canonical definition: [`skills/model-card/SKILL.md`](../../skills/model-card/SKILL.md)
+
+---
+
+*Part of [MedSci Skills](../../README.md) — Claude Code skills for the medical research lifecycle. This page is generated from the skill's `SKILL.md`; edit that file and re-run `scripts/gen_skill_docs.py`.*
@@ -1,8 +1,8 @@
 {
   "_comment": "Single source of truth for catalog counts cited in public docs (README, orchestrate, check-reporting). scripts/validate_catalog_consistency.py recomputes every value from disk, asserts this file matches, and asserts the doc claims match. Do not hand-edit a value without running that script \u2014 CI fails on drift.",
-  "skills": 48,
+  "skills": 49,
   "reporting_guidelines": 38,
   "journal_profiles_find": 73,
   "journal_profiles_write": 55,
-  "integrity_detectors": 38
+  "integrity_detectors": 39
 }
@@ -1,6 +1,6 @@
 {
   "_comment": "AUTO-GENERATED by scripts/gen_detectors_catalog_json.py from the analysis-integrity detectors under skills/*/scripts/ (same glob as validate_catalog_consistency.py). Machine-readable registry of the MedSci-Audit detector suite (single source of truth). Do not hand-edit; CI gate: python3 scripts/gen_detectors_catalog_json.py --check.",
-  "detector_count": 38,
+  "detector_count": 39,
   "families": [
     {
       "key": "numerical_cohort",
@@ -57,6 +57,7 @@
         "check_citation_order",
         "check_disclosure_availability",
         "check_framework_naming",
+        "check_model_card_complete",
         "check_prisma_figure",
         "check_summary_box",
         "check_supplement_hygiene",
@@ -197,6 +198,13 @@
       "family_label": "Style & review-process integrity",
       "description": "Generated-code quality gate for analysis scripts (analyze-stats Phase 3.5)."
     },
+    {
+      "id": "check_model_card_complete",
+      "skill": "model-card",
+      "family": "reporting_compliance",
+      "family_label": "Reporting compliance",
+      "description": "Model Card / Datasheet completeness gate (model-card)."
+    },
     {
       "id": "check_null_calibration",
       "skill": "self-review",
Original file line number	Diff line number	Diff line change
`@@ -1,8 +1,8 @@`
`1`	`1`	`{`
`2`	`2`	`"_comment": "Single source of truth for catalog counts cited in public docs (README, orchestrate, check-reporting). scripts/validate_catalog_consistency.py recomputes every value from disk, asserts this file matches, and asserts the doc claims match. Do not hand-edit a value without running that script \u2014 CI fails on drift.",`
`3`		`- "skills": 48,`
	`3`	`+ "skills": 49,`
`4`	`4`	`"reporting_guidelines": 38,`
`5`	`5`	`"journal_profiles_find": 73,`
`6`	`6`	`"journal_profiles_write": 55,`
`7`		`- "integrity_detectors": 38`
	`7`	`+ "integrity_detectors": 39`
`8`	`8`	`}`