This project makes Common Data Elements (CDEs) more interoperable across clinical research studies using LinkML schemas and AI-assisted semantic mapping.
Clinical research uses CDEs—standardized data fields with defined permissible values—but they're fragmented across repositories and lack semantic bindings to ontologies. This limits data integration and AI-readiness.
We're building tools to:
- Collect CDEs from major repositories (NIH, PhenX, caDSR, RADx, HEAL)
- Convert to LinkML schemas for computability
- Generate semantic mappings using AI and human curation
- Enable data harmonization across studies
📖 Full documentation: https://monarch-initiative.github.io/cde-harmonization/
data/- Raw CDEs from multiple repositorieslinkml/- Generated LinkML schemascde2linkml/- Conversion toolsdocs/- Documentation source