-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
enhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested
Description
Context / Goal
Currently, as documented in the README.md the names of columns are irrelevant when producing hashes of data. This is convenient, as you dont have to alias columns everywhere to ensure they are unique.
However it does make ordering important. While it might be more convenient to compare queries when the columns are all ordered identically, it's potentially less flexible. Arguably as long as there are no name duplicates, Recce could have made where you can tell a dataset to compare by column name matching.
Expected Outcome
- Introduce an optional configuration element to the
dataset:, called perhapscolumnMatchMode columnMatchModeshould bepositionalby default (current behaviour)columnMatchModeshould also allow a new settingnameBased- When set to
nameBasedduring hashing, Recce would- add elements to the hash in a deterministic order, based on the column names
- fail with an appropriate error message if the column metadata implies there are two columns with the same name
- (ideally) fast-fail if there are mismatched names of columns between the two queries (if one query has a column name that the other does not, the hash results cannot possibly match)
Out of Scope
Additional context / implementation notes
tommi-lew
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested