Skip to content

Load and save checks from a Delta table #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 27, 2025
Merged

Load and save checks from a Delta table #339

merged 24 commits into from
May 27, 2025

Conversation

ghanse
Copy link
Contributor

@ghanse ghanse commented May 6, 2025

Changes

Added the following methods to the DQEngine class to save and load checks to a Delta table in a Databricks workspace:

  • save_checks_in_dataframe
  • load_checks_from_dataframe
  • save_checks_in_table
  • load_checks_from_table

Linked issues

Resolves #299

Tests

Added unit and integration tests.

  • manually tested
  • added unit tests
  • added integration tests

@ghanse ghanse requested a review from a team as a code owner May 6, 2025 14:31
@ghanse ghanse requested review from tombonfert and removed request for a team May 6, 2025 14:31
Copy link

github-actions bot commented May 6, 2025

✅ 151/151 passed, 1 skipped, 44m18s total

Running from acceptance #756

@ghanse ghanse requested a review from mwojtyczka May 14, 2025 11:57
@mwojtyczka mwojtyczka requested a review from Copilot May 20, 2025 14:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the ability to persist data quality checks to and from Delta tables, enhances the core engine with DataFrame-based serialization/deserialization, and updates configuration and tests to support a checks_table.

  • Added save_checks_in_dataframe, load_checks_from_dataframe, save_checks_in_table, and load_checks_from_table in DQEngineCore/DQEngine.
  • Extended RunConfig with a checks_table setting and updated integration fixtures.
  • Bumped linter max-attributes to accommodate the new config field.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/test_load_checks_from_dataframe.py Unit tests for load_checks_from_dataframe, including warning.
tests/integration/test_load_checks_from_table.py Integration tests for loading checks from a Delta table.
tests/integration/conftest.py Updated MockInstallationContext to inject checks_table.
src/databricks/labs/dqx/engine.py Implemented DataFrame/table save/load methods and warning logic.
src/databricks/labs/dqx/config.py Added checks_table field to RunConfig.
pyproject.toml Increased max-attributes limit to 16.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds the ability to persist data quality checks in Delta tables alongside existing file-based methods, updating engine logic, configuration, documentation, and demos.

  • Introduce DataFrame ↔ checks conversion and table-based save/load in DQEngineCore/DQEngine
  • Extend RunConfig with a checks_table property and bump linting threshold
  • Add unit/integration tests, update docs and demo to illustrate table support

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/test_build_rules.py Added round-trip unit test for DataFrame ↔ checks conversion
tests/integration/test_load_checks_from_table.py Added integration tests for table save/load
tests/integration/conftest.py Simplified installation_ctx fixture signature
src/databricks/labs/dqx/engine.py Implemented _save/_load and public save_checks_in_table/load_checks_from_table; DF conversion methods
src/databricks/labs/dqx/config.py Added checks_table to RunConfig
pyproject.toml Increased max-attributes lint rule
docs/dqx/docs/guide.mdx Documented Delta table support for quality rules
demos/dqx_demo_library.py Updated demo to save/load checks in a Delta table
Comments suppressed due to low confidence (2)

src/databricks/labs/dqx/engine.py:5

  • The itertools import is unused in this file; consider removing it to avoid unnecessary dependencies.
import itertools

tests/integration/test_load_checks_from_table.py:38

  • [nitpick] Only the default append mode is tested here; consider adding a test for mode='overwrite' to validate that option.
engine.save_checks_in_table(TEST_CHECKS, table_name)

@databrickslabs databrickslabs deleted a comment from Copilot AI May 26, 2025
@databrickslabs databrickslabs deleted a comment from Copilot AI May 26, 2025
Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mwojtyczka mwojtyczka merged commit 8575e7d into main May 27, 2025
10 checks passed
@mwojtyczka mwojtyczka deleted the checks_from_table branch May 27, 2025 17:10
mwojtyczka added a commit that referenced this pull request Jun 6, 2025
* Fix spark remote version detection in CI
[#342](#342)
* Fix spark remote installation
[#346](#346)
* Load and save checks from a Delta table
[#339](#339)
* Handle nulls in uniqueness check for composite keys
[(#345)](#345)
* Allow user metadata for individual checks
[#352](#352)
* Add functionality to save results in delta table
[#319](#319)
* Fix "older than" checks
[#354](#354)
* Add PII-detection example
[#358](#358)
* Add aggregation type of checks
[#357](#357)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]: Provide option to load and save quality rules to/from a table
2 participants