Skip to content

[FEATURE]: Optionally suppress column values in DQX metadata #674

@ghanse

Description

@ghanse

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

DQX supports built-in check functions for potentially large input objects (e.g. regex, JSON, or geometry checks on large string objects). The original column value is often embedded in DQX metadata columns by default. This may significantly increase the dataset size when checking columns with large objects.

Proposed Solution

Add a MetadataLevel class with the following options:

  1. MetadataLevel.COMPACT to suppress input values from the DQX metadata
  2. MetadataLevel.STANDARD to use the existing behavior

Provide parameters for users to suppress input values in the DQX metadata:

  1. Add metadata_level to a DQRule to control behavior for specific checks at the rule level
  2. Add metadata_level to the DQEngine to control behavior for all checks at the engine level

The default will be MetadataLevel.STANDARD unless explicitly configured.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions