Skip to content

Adapter ecosystem for translating between existing DD formats and the canonical DD format #202

@amc-corey-cox

Description

@amc-corey-cox

Scope

Build a family of adapters that translate between existing canonical data dictionary formats and the canonical DD format defined in #191. The machinery is uniform across formats:

  • A LinkML schema describing the source format.
  • A linkml-map trans-spec mapping source ↔ canonical DD.
  • A small serialization utility for the codes encoding bookend.
  • Optional CLI invocation.

Sub-issue of the broader enrichment work in #190.

Why linkml-map

The structural conversion (field renaming, type-vocabulary translation, restructuring) is exactly what linkml-map is designed for. One trans-spec replaces what would otherwise be a Python adapter module per direction. New formats become "write a source schema + a trans-spec," not "write parser + writer code." Bidirectionality is largely automatic.

Out of scope

Code location

Adapters live in `schema_automator/adapters/` as a self-contained directory: source-format LinkML schemas, trans-specs, the codes serialization utility, and any per-format helpers. Import boundary is one-directional — adapters import from `metamodels/` and shared utilities; nothing in core schema-automator imports from adapters. This keeps later extraction to a dedicated repo cheap if/when adapter count or external contribution justifies it.

linkml-map and per-format dependencies become regular schema-automator deps (no optional install extras at this point).

Sub-issues

Schema change carried by the first adapter

The current `codes` slot is a single string with a parseable grammar. The first adapter PR restructures it into a multivalued inlined slot of a new `PermissibleValueDefinition` class so trans-specs operate on the structured form. The TSV grammar is retained as the canonical TSV serialization. This is bundled with the first adapter rather than shipped separately because the schema change is only justified by working trans-spec code.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions