You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build a utility that projects an arbitrary LinkML schema into the canonical data dictionary format defined in #191. Each class becomes a `DataDictionary`; each slot becomes a `DataDictionaryEntry`; ranges → DD types; enums → `PermissibleValueDefinition` lists; per-slot `pattern` / `minimum_value` / `maximum_value` / `unit` carry through where they map.
Bridges the importer family (XSD, JSON Schema, OWL, RDFS, SQL DDL, EML — anything producing a LinkML schema) into the DD-shaped consumers (schema enrichment in #190, ingestion in #192, reconciliation in #193). After this lands, anyone with a LinkML schema can get a canonical DD out of it without writing format-specific glue.
Motivation
The repo currently has two parallel patterns for converting foreign formats:
Importers (`importers/` directory, e.g. `XsdImportEngine`): foreign format → LinkML schema. Used when the input describes dataset structure (XSD, JSON Schema, OWL, RDFS, SQL DDL).
The split is appropriate — schema-shaped inputs and DD-shaped inputs are different — but it leaves a gap: DD consumers can't reach importer outputs. A user with an EML document or an XSD-imported LinkML schema gets a LinkML schema; the DD enrichment workflow needs a DD; no bridge.
If `--class` is given, project just that class. Otherwise, project all top-level classes; emit one DD per class in batch mode (use `..dd.{yaml,tsv}` filenames when `-o` is a directory).
`--tsv` / `--yaml` mirror the other adapter CLI conventions.
Standard parent-dir creation for `-o` (matches `adapt-frictionless` and `adapt-dbgap`).
Python API
```python
from schema_automator.utils.extract_dd import schema_to_dd
`unit`. LinkML's `unit` slot uses a UCUM-flavored object; the DD uses a freeform string. Best-effort: take `unit.symbol` if present, fall back to `unit.ucum_code` or `unit.descriptive_name`.
Things to handle thoughtfully
Abstract classes and mixins. Probably skip in the default projection (a class you can't instantiate isn't a usable DD), but `--include-abstract` if anyone wants them.
Inlined references (slot with class range). A slot whose range is another class isn't a column descriptor in the DD sense. Project as `type: string` with a `description` noting the reference target, or skip entirely (configurable).
Multivalued slots with class ranges. Same case, multivalued. The DD has a `multivalued` Spec B field — apply it.
Identifiers. Slots marked `identifier: true` are valid DD entries; no special handling needed beyond standard projection.
Tree-root annotations. If the schema has a single `tree_root: true` class, default `--class` to that.
Per-slot `examples`. LinkML examples are richer than DD's `example_values`; project the `value` field of each example into the multivalued `example_values` list.
Lossy direction
This is intentionally a one-way projection — class hierarchies, slot inheritance graphs, multi-class relationships, mixins, abstract definitions, structural constraints (rules, classification rules), domain/range cross-references — all flatten or drop. The DD format has none of those; that's the deliberate tradeoff of #191.
Why this is the right home for the bridge
Lives in `schema_automator/utils/extract_dd.py` (or similar) rather than the adapter or importer trees, because it doesn't belong to either pattern — it's a general utility over LinkML schemas. Any importer benefits; any future tool producing a LinkML schema benefits.
Scope
Build a utility that projects an arbitrary LinkML schema into the canonical data dictionary format defined in #191. Each class becomes a `DataDictionary`; each slot becomes a `DataDictionaryEntry`; ranges → DD types; enums → `PermissibleValueDefinition` lists; per-slot `pattern` / `minimum_value` / `maximum_value` / `unit` carry through where they map.
Bridges the importer family (XSD, JSON Schema, OWL, RDFS, SQL DDL, EML — anything producing a LinkML schema) into the DD-shaped consumers (schema enrichment in #190, ingestion in #192, reconciliation in #193). After this lands, anyone with a LinkML schema can get a canonical DD out of it without writing format-specific glue.
Motivation
The repo currently has two parallel patterns for converting foreign formats:
The split is appropriate — schema-shaped inputs and DD-shaped inputs are different — but it leaves a gap: DD consumers can't reach importer outputs. A user with an EML document or an XSD-imported LinkML schema gets a LinkML schema; the DD enrichment workflow needs a DD; no bridge.
This utility is that bridge.
CLI
```
schemauto extract-dd <schema.yaml> [--class ] [-o ] [--tsv|--yaml]
```
Python API
```python
from schema_automator.utils.extract_dd import schema_to_dd
dd = schema_to_dd(schemaview, class_name="MyClass")
```
Mapping rules (sketch)
Things to handle thoughtfully
Lossy direction
This is intentionally a one-way projection — class hierarchies, slot inheritance graphs, multi-class relationships, mixins, abstract definitions, structural constraints (rules, classification rules), domain/range cross-references — all flatten or drop. The DD format has none of those; that's the deliberate tradeoff of #191.
Why this is the right home for the bridge
Lives in `schema_automator/utils/extract_dd.py` (or similar) rather than the adapter or importer trees, because it doesn't belong to either pattern — it's a general utility over LinkML schemas. Any importer benefits; any future tool producing a LinkML schema benefits.
Related