Skip to content

[FEATURE]: Schema compare functions (flat, nested, nullability) #159

Open
@grusin-db

Description

@grusin-db

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

I would like to be able to compare schemas - sometimes they can be long, and/or nested - to find differences in fields, or their data types and/or nullability. Currently there is no such code available.

Proposed Solution

Provide a function taking two schemas (before, and after), that would give result containing:

  • fields added, being set of fields names added
  • fields removed, being set of field name removed
  • fields that have type modified, being map of field name and before and after spark data types (as string), and/or nullability changes.

Nullability checks should be considered optional, as not always it's necessary/desired to check it.

For nested, do the same, but provide field name using dot separator, so that resulting list of changes is always a flat object.

Use data classes for result object, so that syntax highlight could be used.

Additional Context

Results of comparisons should be stored in a table, to allow further analysis, dashboarding. This is out of scope of this story.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions