Skip to content

[FEATURE]: Add severity level to the quality checks #160

Open
@ZubeyirOflaz

Description

@ZubeyirOflaz

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

We frequently encounter data quality errors which signify the subgroup of a given data cannot be relied upon. As an example, certain sensor readings indicate that the test step that contains the erroneous reading shouldn't be trusted.

For these type of use cases, it would be ideal to be able to set a severity level for the check, and specify a list of columns to group the data . If an error is encountered, then all the rows in the given subgroup is marked/quarantined

Proposed Solution

  • Addition of a new optional parameter for the DQEngine to specify the list of columns for grouping.
  • Addition of a new boolean field to the dq rules that will let users specify if an error is severe

If grouping columns are provided and if the dq rule is set to severe, any error that is detected would result in the whole subgroup to be quarantined or marked.

Additional Context

We've made an in house data quality framework where we provide this funcitonality. If you start accepting external contributors, please notify me and I would be happ to try and contribute the implementation for this to your library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions