feat(model-lane): v5.0 Phase 2 — architecture-zoo (research-question → architecture decision tool) (#219)

Yoojin-nam · claude · web-flow · commit a0d18ba239ba · 2026-06-28T20:57:10.000+09:00
The "choose" front end of the model-engineering lane, completing Phase 2
(architecture-zoo → model-scaffold → model-validation). Advisory Layer-D skill,
no detector/torch. Additive: skills 47→48; detectors/probes/guidelines unchanged.

- /architecture-zoo: maps a research question (task + modality/dimensionality +
  labelled-data scale + class imbalance) to a paper-grounded architecture shortlist
  via a decision tree (references/index.md), then per-architecture cards — core idea,
  when-to-use, medical-imaging use, reference implementation, the typical
  validation/experiment setup, and the matching /model-scaffold template. Seeds the
  classification (ResNet/DenseNet/EfficientNet/Inception/ViT/Swin/DeiT), segmentation
  (U-Net/3-D U-Net/V-Net/Attention &amp; Residual U-Net/nnU-Net/SegResNet/Swin-UNETR/
  Mask R-CNN), and foundation/SSL (SAM/MedSAM/MedSAM2/TotalSegmentator/SegVol/
  BiomedCLIP/DINO/MAE/SimCLR/MoCo) families. Every recommendation names its source
  paper; archetypes, not a live SOTA leaderboard.

All CI-mirror gates green locally (validate_skills, all gen_* --check,
validate_catalog_consistency, frontmatter, routing-assets, locale, version, npm).
Version left at 4.10.0 — release is a separate gated step.

Co-authored-by: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -25,6 +25,7 @@
       "source": "./",
       "strict": false,
       "skills": [
+        "./skills/architecture-zoo",
         "./skills/calc-sample-size",
         "./skills/clean-data",
         "./skills/define-variables",
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -59,6 +59,16 @@
     deterministic split matches the frozen expected + is patient-disjoint (proven inline) → passes
     `check_training_hygiene` → a **self-skipping** torch tier (forward shape + gradients + reproducible
     loss when torch is installed; `SKIP`, never CI coverage of runnability, when absent).
+  - **New skill `/architecture-zoo`** (Layer D, advisory) — the *choose* front end of the lane: maps a
+    research question (task + modality / dimensionality + labelled-data scale + class imbalance) to a
+    **paper-grounded** architecture shortlist via a decision tree, then per-architecture cards with core
+    idea, when-to-use, medical-imaging use, reference implementation, the typical validation/experiment
+    setup, and the matching `/model-scaffold` template. Seeds the classification (ResNet / DenseNet /
+    EfficientNet / Inception / ViT / Swin / DeiT), segmentation (U-Net / 3-D U-Net / V-Net / Attention
+    & Residual U-Net / nnU-Net / SegResNet / Swin-UNETR / Mask R-CNN), and foundation/SSL (SAM / MedSAM /
+    MedSAM2 / TotalSegmentator / SegVol / BiomedCLIP / DINO / MAE / SimCLR / MoCo) families. Every
+    recommendation names its source paper; it teaches archetypes, not a live SOTA leaderboard. Skills
+    47 → 48.
 
 ## [4.10.0] - 2026-06-28
 
diff --git a/README.md b/README.md
@@ -2,14 +2,14 @@
 
 # MedSci Skills
 
-**47 skills that actually work.** Built by a physician-researcher, tested on real publications.
+**48 skills that actually work.** Built by a physician-researcher, tested on real publications.
 
 *MedSci Skills is a submission-grade clinical manuscript workflow, not a generic biomedical skill catalog. Its moat is the compliance layer — 38 reporting guidelines and risk-of-bias tools, reference/citation verification, and deterministic integrity gates, before peer review sees the manuscript. It competes on clinical submission reliability, not skill count.*
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 [![Release](https://img.shields.io/github/v/release/Aperivue/medsci-skills?style=flat-square&color=blue)](https://github.com/Aperivue/medsci-skills/releases/latest)
 [![CI](https://img.shields.io/github/actions/workflow/status/Aperivue/medsci-skills/validate.yml?branch=main&style=flat-square&label=CI)](https://github.com/Aperivue/medsci-skills/actions/workflows/validate.yml)
-![Skills](https://img.shields.io/badge/Skills-47-brightgreen?style=flat-square)
+![Skills](https://img.shields.io/badge/Skills-48-brightgreen?style=flat-square)
 [![npm](https://img.shields.io/npm/v/medsci-skills?style=flat-square&label=npm&color=cb3837)](https://www.npmjs.com/package/medsci-skills)
 [![Watch the 2-min intro](https://img.shields.io/badge/▶_Watch-2--min_intro-FF0000?style=flat-square&logo=youtube&logoColor=white)](https://youtu.be/MclQ_RIofpE)
 [![good first issues](https://img.shields.io/github/issues/Aperivue/medsci-skills/good%20first%20issue?style=flat-square&label=good%20first%20issues&color=7057ff)](https://github.com/Aperivue/medsci-skills/contribute)
@@ -453,6 +453,7 @@ ma-scout -> search-lit -> fulltext-retrieval -> design-study ──> write-proto
 | **design-ai-benchmarking** | Design and validity review for benchmarking AI system(s) against a human-expert panel: evaluation-question and arm definition, decoupled multi-dimensional rubrics with anchors, planted calibration probes (positive-control / known-bad / instability / mechanism-contradiction), reviewer-panel construction with per-reviewer randomization, inter-rater reliability targets with separate control-item reliability, LLM-as-judge vs human-as-judge adjudication, construct-independence guards, and a structured JSON rating-export schema. Locks the rubric before data collection. |
 | **model-validation** | Design or audit the clinical-validation study for an engineer-built medical-imaging model (segmentation / classification / detection): patient-level split disjointness and the data-leakage taxonomy, tuning-on-test, internal vs genuine external validation, comparator design, single-run vs multi-seed variance, task-correct metric selection (Metrics Reloaded), test-set sizing, and CLAIM 2024 / TRIPOD+AI / STARD-AI reporting fit. Ships a deterministic split-leakage gate that proves patient disjointness by set arithmetic on the emitted split table. Integrates with MONAI / nnU-Net — does not replace them. |
 | **model-scaffold** | Generate a reproducible, runnable PyTorch training repo for a medical-imaging segmentation task — the missing middle link between choosing an architecture and validating a trained model. Emits a patient-level seed-locked split as an auditable artifact, a configurable U-Net, train/evaluate scripts that seed every RNG and infer under eval mode, a config, requirements, a reproducibility record, and a Methods stub with VERIFY placeholders (no fabricated numbers). Reproducibility holds by construction; ships a `check_training_hygiene` AST gate + a network-free build→validate challenge. Integrates with MONAI / nnU-Net / TorchIO — does not reimplement them. |
+| **architecture-zoo** | "Which architecture for which research question" decision tool: maps task (classification / segmentation / detection / transfer), modality, data scale, and class imbalance to a paper-grounded architecture shortlist. Curates the foundational curriculum (ResNet / DenseNet / EfficientNet / ViT / Swin; U-Net / 3-D U-Net / Attention & Residual U-Net / nnU-Net / Mask R-CNN; SAM/MedSAM / TotalSegmentator / BiomedCLIP / DINO / MAE / SimCLR) — each with core idea, when-to-use, medical-imaging use, reference implementation, validation setup, and the matching model-scaffold template. Advisory; teaches archetypes, not a live SOTA leaderboard. |
 | **intake-project** | Classifies new research projects, summarizes current state, identifies missing inputs, and recommends next steps. |
 | **grant-builder** | Structures grant proposals: significance, innovation, approach, milestones, and consortium roles. |
 | **present-paper** | Academic presentation preparation: paper analysis, supporting research, speaker scripts, slide note injection, and Q&A prep. |
diff --git a/docs/skills/README.md b/docs/skills/README.md
@@ -7,6 +7,7 @@ One reference page per skill, generated from each skill's `SKILL.md` and `skill.
 - [academic-aio](academic-aio.md) — Medical AI paper optimization for AI search engines (Perplexity, ChatGPT web, Elicit, Consensus, SciSpace) and RAG-based literature tools. _(evidence: bundled_script)_
 - [add-journal](add-journal.md) — Add a new journal to the MedSci Skills profile database. _(evidence: manual_workflow)_
 - [analyze-stats](analyze-stats.md) — Statistical analysis for medical research papers. _(evidence: demo)_
+- [architecture-zoo](architecture-zoo.md) — Choose a model architecture for a medical-imaging research question before scaffolding. _(evidence: manual_workflow)_
 - [author-strategy](author-strategy.md) — PubMed author profile analysis. _(evidence: manual_workflow)_
 - [batch-cohort](batch-cohort.md) — Generate N analysis scripts from a single methodology template × multiple exposure/outcome combinations. _(evidence: manual_workflow)_
 - [calc-sample-size](calc-sample-size.md) — Interactive sample size calculator for medical research. _(evidence: manual_workflow)_
diff --git a/docs/skills/architecture-zoo.md b/docs/skills/architecture-zoo.md
@@ -0,0 +1,48 @@
+<!-- AUTO-GENERATED from skills/architecture-zoo/SKILL.md by scripts/gen_skill_docs.py. Do not edit by hand. -->
+
+# architecture-zoo
+
+> Choose a model architecture for a medical-imaging research question before scaffolding. Maps the task (classification, segmentation, detection, transfer), modality and dimensionality, labelled-data scale, and class imbalance to a shortlist of architectures, each grounded in its source paper with a when-to-use, a medical-imaging use, a reference implementation, the typical validation setup, and the matching model-scaffold template. Covers the foundational curriculum (ResNet, DenseNet, EfficientNet, ViT, Swin; U-Net, 3-D U-Net, Attention/Residual U-Net, nnU-Net, Mask R-CNN; SAM/MedSAM, TotalSegmentator, BiomedCLIP, DINO/MAE/SimCLR). It teaches archetypes and the task-to-architecture logic, not a live SOTA leaderboard.
+
+**Invoke:** `/architecture-zoo` · **Tools:** Read, Write, Edit, Grep, Glob · **Model:** inherit
+
+## When to use
+
+`architecture-zoo` activates on requests such as: architecture zoo, which architecture, choose a model, model selection, ResNet vs ViT, U-Net vs nnU-Net, what backbone, foundation model for, transfer learning choice, MedSAM, TotalSegmentator, DINO, MAE, self-supervised, paper to architecture, reference implementation, when to use ViT, segmentation architecture, classification backbone.
+
+## Quality Card
+
+**Purpose** — Turn a medical-imaging research question into a paper-grounded architecture choice — so the build starts from the right archetype (and a known validation setup) rather than from what is fashionable, and the choice carries its source citation into the manuscript.
+
+**Safety boundaries**
+
+- Advisory only: it writes a decision note, never code or weights; the build is /model-scaffold.
+- Every recommendation names its source paper; benchmark numbers are cited, never invented; the zoo describes archetypes, not a live leaderboard.
+
+**Known limitations**
+
+- The literature moves fast; this is a curated archetype map (classification, segmentation, foundation/SSL families seeded), not an exhaustive or current SOTA ranking — additional families (detection, synthesis) land in later phases.
+- A sound architecture choice is necessary, not sufficient; validity still depends on the split, validation design, and metrics (/model-validation, /model-evaluation).
+
+**Validation**
+
+- `carry the decision note into /model-scaffold to instantiate the chosen template, then /model-validation`
+
+**Evidence** — `manual_workflow`
+
+## Bundled resources
+
+**References** (`skills/architecture-zoo/references/`):
+
+- `classification.md`
+- `foundation_models.md`
+- `index.md`
+- `segmentation.md`
+
+## Source
+
+Canonical definition: [`skills/architecture-zoo/SKILL.md`](../../skills/architecture-zoo/SKILL.md)
+
+---
+
+*Part of [MedSci Skills](../../README.md) — Claude Code skills for the medical research lifecycle. This page is generated from the skill's `SKILL.md`; edit that file and re-run `scripts/gen_skill_docs.py`.*
diff --git a/metadata/catalog_counts.json b/metadata/catalog_counts.json
@@ -1,6 +1,6 @@
 {
   "_comment": "Single source of truth for catalog counts cited in public docs (README, orchestrate, check-reporting). scripts/validate_catalog_consistency.py recomputes every value from disk, asserts this file matches, and asserts the doc claims match. Do not hand-edit a value without running that script \u2014 CI fails on drift.",
-  "skills": 47,
+  "skills": 48,
   "reporting_guidelines": 38,
   "journal_profiles_find": 73,
   "journal_profiles_write": 55,
diff --git a/metadata/distribution_files.json b/metadata/distribution_files.json
@@ -396,6 +396,36 @@
       "size": 1421,
       "sha256": "912c52e9289a7ccb014aa8a18105b6dfe04c2cc040e970c73b4bbc6b2d8a8a39"
     },
+    {
+      "path": "skills/architecture-zoo/SKILL.md",
+      "size": 5444,
+      "sha256": "6d8f81262a42ff24e36dca425511804b9f324d2c900f31f701c703e3d8326729"
+    },
+    {
+      "path": "skills/architecture-zoo/references/classification.md",
+      "size": 5986,
+      "sha256": "035e0fddaccb0e19e23ffd7756b075154f847ee20f9a512cd2a010c0db4210fa"
+    },
+    {
+      "path": "skills/architecture-zoo/references/foundation_models.md",
+      "size": 5267,
+      "sha256": "495453b025f1cb13d5ba0bac9be6d3b0d63dd958ea48d2b9e55bc222e9c26786"
+    },
+    {
+      "path": "skills/architecture-zoo/references/index.md",
+      "size": 4065,
+      "sha256": "a1ed80efcf9a56e0286972ae9b08bd38965bb20a569e3aac5314549dfa6ad5f4"
+    },
+    {
+      "path": "skills/architecture-zoo/references/segmentation.md",
+      "size": 6508,
+      "sha256": "17618fe3d6884cf89b034ededcd69e081a65d7bfd473495eb2ab1fb5d8b15d8b"
+    },
+    {
+      "path": "skills/architecture-zoo/skill.yml",
+      "size": 2889,
+      "sha256": "275cfb1d0779028d79d8596879c09b5b6c714859142cb72a12b7ec901acc69e1"
+    },
     {
       "path": "skills/author-strategy/SKILL.md",
       "size": 9209,
diff --git a/metadata/distribution_manifest.json b/metadata/distribution_manifest.json
@@ -5,6 +5,7 @@
     "academic-aio",
     "add-journal",
     "analyze-stats",
+    "architecture-zoo",
     "author-strategy",
     "batch-cohort",
     "calc-sample-size",
diff --git a/metadata/skills_catalog.json b/metadata/skills_catalog.json
@@ -1,6 +1,6 @@
 {
   "_comment": "AUTO-GENERATED by scripts/gen_skills_catalog_json.py from each skills/<slug>/SKILL.md + skill.yml. Machine-readable skill catalog (single source of truth) consumed by external surfaces such as the aperivue.com storefront to gate skill-list completeness. Do not hand-edit; CI gate: python3 scripts/gen_skills_catalog_json.py --check.",
-  "skill_count": 47,
+  "skill_count": 48,
   "categories": [
     {
       "key": "literature_references",
@@ -17,6 +17,7 @@
       "key": "data_study_design",
       "label": "Data & Study Design",
       "slugs": [
+        "architecture-zoo",
         "calc-sample-size",
         "clean-data",
         "define-variables",
@@ -126,6 +127,15 @@
       "maturity": "official",
       "description": "Statistical analysis for medical research papers."
     },
+    {
+      "slug": "architecture-zoo",
+      "category": "data_study_design",
+      "category_label": "Data & Study Design",
+      "layer": "D",
+      "owner_domain": "architecture_reference",
+      "maturity": "official",
+      "description": "Choose a model architecture for a medical-imaging research question before scaffolding."
+    },
     {
       "slug": "author-strategy",
       "category": "project_workflow",
diff --git a/scripts/gen_skills_catalog_json.py b/scripts/gen_skills_catalog_json.py
@@ -54,6 +54,7 @@
     # "Model Engineering & Validation" storefront category at the v5.0.0 major).
     "model_validation": ("data_study_design", "Data & Study Design"),
     "model_development": ("data_study_design", "Data & Study Design"),
+    "architecture_reference": ("data_study_design", "Data & Study Design"),
     # Analysis & figures
     "statistical_analysis": ("analysis_figures", "Analysis & Figures"),
     "figure_generation": ("analysis_figures", "Analysis & Figures"),
diff --git a/skills/architecture-zoo/SKILL.md b/skills/architecture-zoo/SKILL.md
@@ -0,0 +1,96 @@
+---
+name: architecture-zoo
+description: >
+  Choose a model architecture for a medical-imaging research question before scaffolding. Maps the task
+  (classification, segmentation, detection, transfer), modality and dimensionality, labelled-data scale,
+  and class imbalance to a shortlist of architectures, each grounded in its source paper with a
+  when-to-use, a medical-imaging use, a reference implementation, the typical validation setup, and the
+  matching model-scaffold template. Covers the foundational curriculum (ResNet, DenseNet, EfficientNet,
+  ViT, Swin; U-Net, 3-D U-Net, Attention/Residual U-Net, nnU-Net, Mask R-CNN; SAM/MedSAM,
+  TotalSegmentator, BiomedCLIP, DINO/MAE/SimCLR). It teaches archetypes and the task-to-architecture
+  logic, not a live SOTA leaderboard.
+triggers: architecture zoo, which architecture, choose a model, model selection, ResNet vs ViT, U-Net vs nnU-Net, what backbone, foundation model for, transfer learning choice, MedSAM, TotalSegmentator, DINO, MAE, self-supervised, paper to architecture, reference implementation, when to use ViT, segmentation architecture, classification backbone
+tools: Read, Write, Edit, Grep, Glob
+model: inherit
+---
+
+# Architecture-Zoo Skill
+
+## Purpose
+
+This skill turns a **medical-imaging research question into a paper-grounded architecture choice** —
+so the build starts from the right archetype (and a known validation setup) rather than from whatever is
+fashionable, and the choice carries its source citation into the Methods. It is the **front end** of the
+model-engineering lane: `architecture-zoo (choose)` → `/model-scaffold (build)` → `/model-validation
+(validate)`.
+
+It is **advisory** (Layer D): it writes a short decision note, never code or weights. The actual repo is
+`/model-scaffold`. It describes **archetypes and the task → family → constraint logic**, not a live SOTA
+leaderboard (SOTA churns; the logic does not).
+
+## When to use
+- You need to pick an architecture/backbone for a classification, segmentation, detection, or
+  transfer-learning question and want it grounded in the literature with a sensible default.
+
+## When NOT to use
+- Generating the runnable repo → `/model-scaffold`.
+- Auditing a trained model's validation design → `/model-validation`.
+- Metrics / calibration → `/model-evaluation` + `/analyze-stats`.
+- General study/validity design → `/design-study`; AI-vs-expert benchmark → `/design-ai-benchmarking`.
+- LLM / MLLM → `/mllm-eval`.
+
+## Workflow
+
+### Phase 1 — Frame the question
+State the **task** (classification / segmentation / detection / transfer), the **modality +
+dimensionality** (2-D vs 3-D volume), the **labelled-data scale** (events / structures, not just
+images), **label availability** (lots / few / unlabelled pool), and constraints (class imbalance,
+small structures, interpretability, deployment compute).
+
+### Phase 2 — Walk the decision tree
+Open `${CLAUDE_SKILL_DIR}/references/index.md` and follow task → constraints → default pick. It routes to
+a family card.
+
+### Phase 3 — Read the family card
+- `${CLAUDE_SKILL_DIR}/references/classification.md` — ResNet / DenseNet / EfficientNet / Inception /
+  ViT / Swin / DeiT.
+- `${CLAUDE_SKILL_DIR}/references/segmentation.md` — U-Net / 3-D U-Net / V-Net / Attention & Residual
+  U-Net / nnU-Net / SegResNet / Swin-UNETR / Mask R-CNN.
+- `${CLAUDE_SKILL_DIR}/references/foundation_models.md` — SAM / MedSAM / MedSAM2 / TotalSegmentator /
+  SegVol / BiomedCLIP / DINO / MAE / SimCLR / MoCo.
+Each card gives the paper, core idea, when-to-use, medical-imaging use, reference implementation, and the
+**typical validation/experiment setup** for that architecture class.
+
+### Phase 4 — Write the decision note
+Record `decisions/architecture_choice.md`: the **task**, the **chosen architecture**, its **source
+paper**, the **reason** against the constraints, the **runner-up + why not**, and the matching
+**`/model-scaffold` template**. Naming the source paper is mandatory; cite, never invent, any benchmark
+number.
+
+### Phase 5 — Hand off
+Carry the decision note to `/model-scaffold` (instantiate the template), then `/model-validation`
+(split / validation design), `/model-evaluation` + `/analyze-stats` (metrics), and `/write-paper`
+(the Methods cite the architecture's source paper).
+
+## Anti-Hallucination
+
+- **Never recommend an architecture without naming its source paper.** Every card cites the paper; the
+  decision note must carry that citation.
+- **Never invent benchmark numbers or paper claims.** If a number matters, cite it (verify via
+  `/search-lit`); if uncertain, write `[VERIFY]` and ask.
+- **Never recommend an architecture for a modality or data scale it does not suit** (e.g. a from-scratch
+  ViT on a few hundred images, or 2-D slices for a volumetric structure) — the constraints in the
+  decision tree exist to prevent exactly that.
+- The zoo is a curated **archetype** map, not a current SOTA ranking — say so rather than implying a
+  recommendation is the latest best.
+
+## Boundaries
+
+```
+architecture-zoo (this skill: choose, paper-grounded)
+  └─ model-scaffold (build the reproducible repo from the chosen template)
+       └─ model-validation -> model-evaluation -> write-paper (cite the source paper)
+```
+
+It does not build, train, evaluate, or rank live SOTA — it maps the research question to a defensible,
+paper-grounded archetype and hands the choice to `/model-scaffold`.
diff --git a/skills/architecture-zoo/references/classification.md b/skills/architecture-zoo/references/classification.md
diff --git a/skills/architecture-zoo/references/foundation_models.md b/skills/architecture-zoo/references/foundation_models.md
diff --git a/skills/architecture-zoo/references/index.md b/skills/architecture-zoo/references/index.md
diff --git a/skills/architecture-zoo/references/segmentation.md b/skills/architecture-zoo/references/segmentation.md
diff --git a/skills/architecture-zoo/skill.yml b/skills/architecture-zoo/skill.yml

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"_comment": "Single source of truth for catalog counts cited in public docs (README, orchestrate, check-reporting). scripts/validate_catalog_consistency.py recomputes every value from disk, asserts this file matches, and asserts the doc claims match. Do not hand-edit a value without running that script \u2014 CI fails on drift.",`
`3`		`- "skills": 47,`
	`3`	`+ "skills": 48,`
`4`	`4`	`"reporting_guidelines": 38,`
`5`	`5`	`"journal_profiles_find": 73,`
`6`	`6`	`"journal_profiles_write": 55,`