Skip to content

Ingest structured data dictionary to enrich inferred schemas #192

@amc-corey-cox

Description

@amc-corey-cox

Implement a CLI option to accept a structured data dictionary file (format defined in #191) alongside data files, and merge the declared metadata into the inferred schema.

Behavior

  • Run inference on data as usual (ranges, enums, optionality, etc.)
  • Read the data dictionary and overlay its declarations onto the inferred schema
  • Where they agree, use the richer information from the dictionary (e.g., add descriptions, units, coded value labels)
  • Where they conflict, flag the discrepancy (see Reconciliation report: inferred schema vs. declared data dictionary #193)

Scope

  • Extends the existing --data-dictionary-row-count concept but with a separate file and richer structure
  • Pure data processing — no LLM dependency

Depends on #191. Part of #190.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions