Add 118 microschemas from CDE HDBSCAN clustering#15
Open
justaddcoffee wants to merge 1 commit intomainfrom
Open
Add 118 microschemas from CDE HDBSCAN clustering#15justaddcoffee wants to merge 1 commit intomainfrom
justaddcoffee wants to merge 1 commit intomainfrom
Conversation
Generate LinkML microschemas for all 118 CDE clusters from Madan et al. (arxiv:2506.02160) HDBSCAN clustering of NIH/NLM CDEs. Each cluster maps to a separate YAML file organized by record type: - clinical_measurement/: 103 files - condition_status/: 10 files - drug_status/: 4 files - procedure_status/: 1 file Each file follows the rules pattern with preconditions (measurement_type or condition_type codes) and postconditions (units, data_type, value ranges, instruments). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
clinical_measurement/: 103 files (questionnaires, lab tests, physiological measurements)condition_status/: 10 files (diagnoses, symptom presence, medical history)drug_status/: 4 files (medication exposure, substance use)procedure_status/: 1 file (EEG, neuroimaging, neurosurgical procedures)Agent swarm strategy
These 118 files were generated by a swarm of 12 Claude subagents running in parallel, each assigned a batch of clusters. The orchestration worked as follows:
1. Data preparation
/tmp/clusters/cluster_NNN.json{ "name": "Pulmonary Function Assessment Cluster", "cdes": [ { "tinyId": "kNUZxW9ZPu", "designation": "Forced expiratory volume 1 second value", "definition": "Value of %FEV1 from within one month of the assessment", "permissible_values": "No permissible values", "stewardOrg": "NHLBI" } ] }2. Reference prompt
A shared reference document (
/tmp/microschema_reference.md) was written for all agents, specifying the LinkML pattern, record type classification rules, directory layout, and file naming convention:3. Agent dispatch
12
general-purposesubagents were launched in parallel via the Claude Code Task tool, each assigned a non-overlapping batch of cluster numbers:Each agent received the same prompt template:
4. Results
Note on failed first attempt
The initial strategy tried to dispatch 20 OpenAI Codex agents via MCP, but this failed because Claude Code subagents cannot nest Task tool calls (required by the codex MCP tool configuration). The pivot to direct generation by Claude subagents was faster and more reliable.
Test plan
yaml.safe_load()validationgen-pythonorlinkml-validateReplaces #13 (moved from fork to upstream repo).
🤖 Generated with Claude Code