Scope
Import Ecological Metadata Language (EML) XML documents into LinkML schemas. Lives in the importer family alongside `XsdImportEngine` (#153), `JsonSchemaImportEngine`, `OwlImportEngine`, etc. Output is a full LinkML schema describing the dataset(s) the EML document describes; the canonical-DD-shaped view is reachable via the `extract-dd` projection utility (#209).
Supersedes linkml/linkml#2168 (the original tracking issue, upstream rather than in this repo) and the never-finished draft PR #138.
Why importer rather than adapter
EML is not a data dictionary format — it's a metadata document describing a dataset. The DD layer is one facet alongside:
- Bibliographic and provenance information
- Geographic / temporal / taxonomic coverage
- Sampling methods and protocols
- Multi-table relationships
- Pointers to data files
- Unit definitions
Treating EML as a DD adapter (#202 family) would throw away the schema-shaped richness EML actually provides. Treating it as a schema importer preserves the structural information; the DD-shaped use case is served downstream by #209's projection utility.
The split that drops out:
Deliverables
LinkML representation of EML format
Use `XsdImportEngine` on EML's published XSD (https://eml.ecoinformatics.org/eml-schema) to bootstrap a LinkML representation of EML itself. Then hand-trim to the data-relevant subset — `DataTable`, `Attribute`, `MeasurementScale`, `EnumeratedDomain`, `StandardUnit`/`CustomUnit`, etc. — dropping the broader metadata wrapper (bibliographic, coverage, methods) unless we want to model them too.
This auto-generation step saves manual transcription and ensures our representation matches the upstream EML spec rather than just what's in our sample documents.
Import engine
`schema_automator/importers/eml_import_engine.py` — `EMLImportEngine` subclassing `ImportEngine`. Walks an EML document, producing a LinkML schema where:
- Each `` → a class.
- Each `` → a slot on that class.
- `` and storage type → slot range (one of LinkML's built-in types, or an enum class for ``).
- `` → `PermissibleValue` entries on the enum.
- `` / `` → slot `unit`.
- Description, units, examples carry through.
- Multi-table linkage (when expressed in EML) preserved via slot ranges or inlined references.
CLI
```
schemauto import-eml <document.eml> [-o ] [-n ] [-I ]
```
Matches the existing importer CLI shape (`import-xsd`, `import-jsonschema`, etc.).
Tests
Real-world fixtures from the EML samples referenced in linkml/linkml#2168:
Reaching the DD-shaped consumer
The dm-bip-style DD-enrichment use case is served by chaining:
```
schemauto import-eml dataset.eml -o dataset-schema.yaml
schemauto extract-dd dataset-schema.yaml -o dataset-dd.yaml
```
No EML-specific DD adapter needed.
Things to handle thoughtfully
- Multi-table EML documents. An EML document can describe several `` blocks. Emit one class per data table (similar to dbGaP's one-DD-per-pht).
- Measurement scale richness. EML's `nominal` / `ordinal` / `interval` / `ratio` distinction is finer than LinkML's built-in types. Codes-bearing nominal/ordinal → enum class. Ordinal-without-codes is interesting; defer.
- Unit handling. EML's `` references the EML unit dictionary; `` defines new units. LinkML's slot `unit` is richer than the canonical DD's freeform string but still simpler than EML's machinery. Best-effort: extract the unit name/symbol; preserve the URI when available.
- Metadata-only documents. An EML document without data tables should produce a schema covering whatever non-data classes are appropriate, or just emit a near-empty schema with a clear warning.
- Domain-specific metadata. Geographic coverage, taxonomic coverage, sampling methods — these have structural shape but don't fit the data-table pattern. Skip in v1; could be future enrichment.
What's not salvageable from PR #138
The 2024 draft has a stub `EMLImportEngine` whose `convert()` returns an empty schema, references a nonexistent `schema_automator.metamodels.eml` module, and tries to load XML via `json_loader`. The branch is 126 commits behind main and the file location was right (`importers/`) but the implementation has nothing to keep.
Related: linkml/linkml#2168 (upstream tracking issue), #87 / #153 (XSD importer that this builds on), #209 (extract-dd projection utility, the bridge from importer output to DD-shaped consumers).
Scope
Import Ecological Metadata Language (EML) XML documents into LinkML schemas. Lives in the importer family alongside `XsdImportEngine` (#153), `JsonSchemaImportEngine`, `OwlImportEngine`, etc. Output is a full LinkML schema describing the dataset(s) the EML document describes; the canonical-DD-shaped view is reachable via the `extract-dd` projection utility (#209).
Supersedes linkml/linkml#2168 (the original tracking issue, upstream rather than in this repo) and the never-finished draft PR #138.
Why importer rather than adapter
EML is not a data dictionary format — it's a metadata document describing a dataset. The DD layer is one facet alongside:
Treating EML as a DD adapter (#202 family) would throw away the schema-shaped richness EML actually provides. Treating it as a schema importer preserves the structural information; the DD-shaped use case is served downstream by #209's projection utility.
The split that drops out:
Deliverables
LinkML representation of EML format
Use `XsdImportEngine` on EML's published XSD (https://eml.ecoinformatics.org/eml-schema) to bootstrap a LinkML representation of EML itself. Then hand-trim to the data-relevant subset — `DataTable`, `Attribute`, `MeasurementScale`, `EnumeratedDomain`, `StandardUnit`/`CustomUnit`, etc. — dropping the broader metadata wrapper (bibliographic, coverage, methods) unless we want to model them too.
This auto-generation step saves manual transcription and ensures our representation matches the upstream EML spec rather than just what's in our sample documents.
Import engine
`schema_automator/importers/eml_import_engine.py` — `EMLImportEngine` subclassing `ImportEngine`. Walks an EML document, producing a LinkML schema where:
CLI
```
schemauto import-eml <document.eml> [-o ] [-n ] [-I ]
```
Matches the existing importer CLI shape (`import-xsd`, `import-jsonschema`, etc.).
Tests
Real-world fixtures from the EML samples referenced in linkml/linkml#2168:
Reaching the DD-shaped consumer
The dm-bip-style DD-enrichment use case is served by chaining:
```
schemauto import-eml dataset.eml -o dataset-schema.yaml
schemauto extract-dd dataset-schema.yaml -o dataset-dd.yaml
```
No EML-specific DD adapter needed.
Things to handle thoughtfully
What's not salvageable from PR #138
The 2024 draft has a stub `EMLImportEngine` whose `convert()` returns an empty schema, references a nonexistent `schema_automator.metamodels.eml` module, and tries to load XML via `json_loader`. The branch is 126 commits behind main and the file location was right (`importers/`) but the implementation has nothing to keep.
Related: linkml/linkml#2168 (upstream tracking issue), #87 / #153 (XSD importer that this builds on), #209 (extract-dd projection utility, the bridge from importer output to DD-shaped consumers).