Load and save checks from a Delta table #339

ghanse · 2025-05-06T14:31:03Z

Changes

Added the following methods to the DQEngine class to save and load checks to a Delta table in a Databricks workspace:

save_checks_in_dataframe
load_checks_from_dataframe
save_checks_in_table
load_checks_from_table

Linked issues

Resolves #299

Tests

Added unit and integration tests.

manually tested
added unit tests
added integration tests

github-actions · 2025-05-06T14:40:07Z

✅ 151/151 passed, 1 skipped, 44m18s total

_{Running from acceptance #756}

src/databricks/labs/dqx/engine.py

tests/integration/test_load_checks_from_table.py

src/databricks/labs/dqx/engine.py

Copilot

Pull Request Overview

This PR introduces the ability to persist data quality checks to and from Delta tables, enhances the core engine with DataFrame-based serialization/deserialization, and updates configuration and tests to support a checks_table.

Added save_checks_in_dataframe, load_checks_from_dataframe, save_checks_in_table, and load_checks_from_table in DQEngineCore/DQEngine.
Extended RunConfig with a checks_table setting and updated integration fixtures.
Bumped linter max-attributes to accommodate the new config field.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/unit/test_load_checks_from_dataframe.py	Unit tests for `load_checks_from_dataframe`, including warning.
tests/integration/test_load_checks_from_table.py	Integration tests for loading checks from a Delta table.
tests/integration/conftest.py	Updated `MockInstallationContext` to inject `checks_table`.
src/databricks/labs/dqx/engine.py	Implemented DataFrame/table save/load methods and warning logic.
src/databricks/labs/dqx/config.py	Added `checks_table` field to `RunConfig`.
pyproject.toml	Increased `max-attributes` limit to 16.

src/databricks/labs/dqx/engine.py

tests/unit/test_load_checks_from_dataframe.py

src/databricks/labs/dqx/engine.py

tests/integration/test_load_checks_from_table.py

Copilot

Pull Request Overview

This PR adds the ability to persist data quality checks in Delta tables alongside existing file-based methods, updating engine logic, configuration, documentation, and demos.

Introduce DataFrame ↔ checks conversion and table-based save/load in DQEngineCore/DQEngine
Extend RunConfig with a checks_table property and bump linting threshold
Add unit/integration tests, update docs and demo to illustrate table support

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/unit/test_build_rules.py	Added round-trip unit test for DataFrame ↔ checks conversion
tests/integration/test_load_checks_from_table.py	Added integration tests for table save/load
tests/integration/conftest.py	Simplified `installation_ctx` fixture signature
src/databricks/labs/dqx/engine.py	Implemented `_save/_load` and public `save_checks_in_table`/`load_checks_from_table`; DF conversion methods
src/databricks/labs/dqx/config.py	Added `checks_table` to `RunConfig`
pyproject.toml	Increased max-attributes lint rule
docs/dqx/docs/guide.mdx	Documented Delta table support for quality rules
demos/dqx_demo_library.py	Updated demo to save/load checks in a Delta table

Comments suppressed due to low confidence (2)

src/databricks/labs/dqx/engine.py:5

The itertools import is unused in this file; consider removing it to avoid unnecessary dependencies.

import itertools

tests/integration/test_load_checks_from_table.py:38

[nitpick] Only the default append mode is tested here; consider adding a test for mode='overwrite' to validate that option.

engine.save_checks_in_table(TEST_CHECKS, table_name)

demos/dqx_demo_library.py

src/databricks/labs/dqx/config.py

src/databricks/labs/dqx/engine.py

demos/dqx_demo_library.py

src/databricks/labs/dqx/engine.py

mwojtyczka

LGTM

* Fix spark remote version detection in CI [#342](#342) * Fix spark remote installation [#346](#346) * Load and save checks from a Delta table [#339](#339) * Handle nulls in uniqueness check for composite keys [(#345)](#345) * Allow user metadata for individual checks [#352](#352) * Add functionality to save results in delta table [#319](#319) * Fix "older than" checks [#354](#354) * Add PII-detection example [#358](#358) * Add aggregation type of checks [#357](#357)

Load and save checks from a Delta table

a1c768e

ghanse requested a review from a team as a code owner May 6, 2025 14:31

ghanse requested review from tombonfert and removed request for a team May 6, 2025 14:31

ghanse had a problem deploying to tool May 6, 2025 14:31 — with GitHub Actions Failure

Use table_exists from the tables.exists response

2a8397e

ghanse had a problem deploying to tool May 6, 2025 15:45 — with GitHub Actions Failure

ghanse added 2 commits May 6, 2025 12:00

Create catalog and schema for tests

a9b8e42

Create catalog and schema for tests

1cc2a91

ghanse had a problem deploying to tool May 6, 2025 16:03 — with GitHub Actions Failure

mwojtyczka requested changes May 8, 2025

View reviewed changes

ghanse added 2 commits May 13, 2025 15:14

Updated and added tests

81a9af0

Cleaned up and added methods to load checks from a DataFrame

a4ae4bd

ghanse had a problem deploying to tool May 13, 2025 19:14 — with GitHub Actions Failure

ghanse had a problem deploying to tool May 13, 2025 19:14 — with GitHub Actions Error

Bump max-attributes for installation context

c85e575

ghanse had a problem deploying to tool May 13, 2025 19:23 — with GitHub Actions Failure

Updated tests

20f5e42

ghanse requested a review from mwojtyczka May 14, 2025 11:57

mwojtyczka requested a review from Copilot May 20, 2025 14:44

Copilot AI reviewed May 20, 2025

View reviewed changes

src/databricks/labs/dqx/engine.py Outdated Show resolved Hide resolved

src/databricks/labs/dqx/engine.py Outdated Show resolved Hide resolved

tests/unit/test_load_checks_from_dataframe.py Outdated Show resolved Hide resolved

mwojtyczka requested changes May 20, 2025

View reviewed changes

Update method names and docstrings; Remove query attribute

0e2cb04

ghanse had a problem deploying to tool May 20, 2025 17:27 — with GitHub Actions Failure

ghanse temporarily deployed to tool May 23, 2025 20:18 — with GitHub Actions Inactive

ghanse requested a review from mwojtyczka May 23, 2025 20:19

Merge branch 'main' into checks_from_table

4047fc7

mwojtyczka temporarily deployed to tool May 26, 2025 11:54 — with GitHub Actions Inactive

mwojtyczka requested a review from Copilot May 26, 2025 11:54

Copilot AI reviewed May 26, 2025

View reviewed changes

demos/dqx_demo_library.py Outdated Show resolved Hide resolved

src/databricks/labs/dqx/config.py Show resolved Hide resolved

mwojtyczka requested changes May 26, 2025

View reviewed changes

databrickslabs deleted a comment from Copilot AI May 26, 2025

Add run_config_name to save and load methods

313232f

ghanse had a problem deploying to tool May 26, 2025 17:11 — with GitHub Actions Failure

Clean up demo and fix tests

b0a9936

ghanse had a problem deploying to tool May 26, 2025 17:44 — with GitHub Actions Failure

Fix tests

1aca28a

ghanse had a problem deploying to tool May 26, 2025 18:33 — with GitHub Actions Failure

Fix tests

5a8b949

ghanse had a problem deploying to tool May 26, 2025 18:59 — with GitHub Actions Failure

ghanse temporarily deployed to tool May 26, 2025 18:59 — with GitHub Actions Inactive

mwojtyczka approved these changes May 27, 2025

View reviewed changes

ghanse temporarily deployed to tool May 27, 2025 06:16 — with GitHub Actions Inactive

mwojtyczka merged commit 8575e7d into main May 27, 2025
10 checks passed

mwojtyczka deleted the checks_from_table branch May 27, 2025 17:10

This was referenced Jun 3, 2025

[FEATURE]: Add functions to save DQ rules in Delta tables (and loading from them) #351

Closed

Release v0.5.0 #361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Load and save checks from a Delta table #339

Load and save checks from a Delta table #339

ghanse commented May 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mwojtyczka left a comment

Uh oh!

Uh oh!

Uh oh!

Load and save checks from a Delta table #339

Load and save checks from a Delta table #339

Conversation

ghanse commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Linked issues

Tests

Uh oh!

github-actions bot commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mwojtyczka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ghanse commented May 6, 2025 •

edited

Loading

github-actions bot commented May 6, 2025 •

edited

Loading