Scope
Build a family of adapters that translate between existing canonical data dictionary formats and the canonical DD format defined in #191. The machinery is uniform across formats:
- A LinkML schema describing the source format.
- A linkml-map trans-spec mapping source ↔ canonical DD.
- A small serialization utility for the codes encoding bookend.
- Optional CLI invocation.
Sub-issue of the broader enrichment work in #190.
Why linkml-map
The structural conversion (field renaming, type-vocabulary translation, restructuring) is exactly what linkml-map is designed for. One trans-spec replaces what would otherwise be a Python adapter module per direction. New formats become "write a source schema + a trans-spec," not "write parser + writer code." Bidirectionality is largely automatic.
Out of scope
Code location
Adapters live in `schema_automator/adapters/` as a self-contained directory: source-format LinkML schemas, trans-specs, the codes serialization utility, and any per-format helpers. Import boundary is one-directional — adapters import from `metamodels/` and shared utilities; nothing in core schema-automator imports from adapters. This keeps later extraction to a dedicated repo cheap if/when adapter count or external contribution justifies it.
linkml-map and per-format dependencies become regular schema-automator deps (no optional install extras at this point).
Sub-issues
Schema change carried by the first adapter
The current `codes` slot is a single string with a parseable grammar. The first adapter PR restructures it into a multivalued inlined slot of a new `PermissibleValueDefinition` class so trans-specs operate on the structured form. The TSV grammar is retained as the canonical TSV serialization. This is bundled with the first adapter rather than shipped separately because the schema change is only justified by working trans-spec code.
Scope
Build a family of adapters that translate between existing canonical data dictionary formats and the canonical DD format defined in #191. The machinery is uniform across formats:
Sub-issue of the broader enrichment work in #190.
Why linkml-map
The structural conversion (field renaming, type-vocabulary translation, restructuring) is exactly what linkml-map is designed for. One trans-spec replaces what would otherwise be a Python adapter module per direction. New formats become "write a source schema + a trans-spec," not "write parser + writer code." Bidirectionality is largely automatic.
Out of scope
Code location
Adapters live in `schema_automator/adapters/` as a self-contained directory: source-format LinkML schemas, trans-specs, the codes serialization utility, and any per-format helpers. Import boundary is one-directional — adapters import from `metamodels/` and shared utilities; nothing in core schema-automator imports from adapters. This keeps later extraction to a dedicated repo cheap if/when adapter count or external contribution justifies it.
linkml-map and per-format dependencies become regular schema-automator deps (no optional install extras at this point).
Sub-issues
Schema change carried by the first adapter
The current `codes` slot is a single string with a parseable grammar. The first adapter PR restructures it into a multivalued inlined slot of a new `PermissibleValueDefinition` class so trans-specs operate on the structured form. The TSV grammar is retained as the canonical TSV serialization. This is bundled with the first adapter rather than shipped separately because the schema change is only justified by working trans-spec code.