Open
Description
Is there an existing issue for this?
- I have searched the existing issues
Problem statement
I would like to be able to compare schemas - sometimes they can be long, and/or nested - to find differences in fields, or their data types and/or nullability. Currently there is no such code available.
Proposed Solution
Provide a function taking two schemas (before, and after), that would give result containing:
- fields added, being set of fields names added
- fields removed, being set of field name removed
- fields that have type modified, being map of field name and before and after spark data types (as string), and/or nullability changes.
Nullability checks should be considered optional, as not always it's necessary/desired to check it.
For nested, do the same, but provide field name using dot separator, so that resulting list of changes is always a flat object.
Use data classes for result object, so that syntax highlight could be used.
Additional Context
Results of comparisons should be stored in a table, to allow further analysis, dashboarding. This is out of scope of this story.