Skip to content

Support comparing datasets column-agnostic, using names only #55

@chadlwilson

Description

@chadlwilson

Context / Goal

Currently, as documented in the README.md the names of columns are irrelevant when producing hashes of data. This is convenient, as you dont have to alias columns everywhere to ensure they are unique.

However it does make ordering important. While it might be more convenient to compare queries when the columns are all ordered identically, it's potentially less flexible. Arguably as long as there are no name duplicates, Recce could have made where you can tell a dataset to compare by column name matching.

Expected Outcome

  • Introduce an optional configuration element to the dataset:, called perhaps columnMatchMode
  • columnMatchMode should be positional by default (current behaviour)
  • columnMatchMode should also allow a new setting nameBased
  • When set to nameBased during hashing, Recce would
    • add elements to the hash in a deterministic order, based on the column names
    • fail with an appropriate error message if the column metadata implies there are two columns with the same name
    • (ideally) fast-fail if there are mismatched names of columns between the two queries (if one query has a column name that the other does not, the hash results cannot possibly match)

Out of Scope

Additional context / implementation notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions