-
Notifications
You must be signed in to change notification settings - Fork 44
Load and save checks from a Delta table #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ 151/151 passed, 1 skipped, 44m18s total Running from acceptance #756 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the ability to persist data quality checks to and from Delta tables, enhances the core engine with DataFrame-based serialization/deserialization, and updates configuration and tests to support a checks_table
.
- Added
save_checks_in_dataframe
,load_checks_from_dataframe
,save_checks_in_table
, andload_checks_from_table
inDQEngineCore
/DQEngine
. - Extended
RunConfig
with achecks_table
setting and updated integration fixtures. - Bumped linter
max-attributes
to accommodate the new config field.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
tests/unit/test_load_checks_from_dataframe.py | Unit tests for load_checks_from_dataframe , including warning. |
tests/integration/test_load_checks_from_table.py | Integration tests for loading checks from a Delta table. |
tests/integration/conftest.py | Updated MockInstallationContext to inject checks_table . |
src/databricks/labs/dqx/engine.py | Implemented DataFrame/table save/load methods and warning logic. |
src/databricks/labs/dqx/config.py | Added checks_table field to RunConfig . |
pyproject.toml | Increased max-attributes limit to 16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds the ability to persist data quality checks in Delta tables alongside existing file-based methods, updating engine logic, configuration, documentation, and demos.
- Introduce DataFrame ↔ checks conversion and table-based save/load in
DQEngineCore
/DQEngine
- Extend
RunConfig
with achecks_table
property and bump linting threshold - Add unit/integration tests, update docs and demo to illustrate table support
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
tests/unit/test_build_rules.py | Added round-trip unit test for DataFrame ↔ checks conversion |
tests/integration/test_load_checks_from_table.py | Added integration tests for table save/load |
tests/integration/conftest.py | Simplified installation_ctx fixture signature |
src/databricks/labs/dqx/engine.py | Implemented _save/_load and public save_checks_in_table /load_checks_from_table ; DF conversion methods |
src/databricks/labs/dqx/config.py | Added checks_table to RunConfig |
pyproject.toml | Increased max-attributes lint rule |
docs/dqx/docs/guide.mdx | Documented Delta table support for quality rules |
demos/dqx_demo_library.py | Updated demo to save/load checks in a Delta table |
Comments suppressed due to low confidence (2)
src/databricks/labs/dqx/engine.py:5
- The
itertools
import is unused in this file; consider removing it to avoid unnecessary dependencies.
import itertools
tests/integration/test_load_checks_from_table.py:38
- [nitpick] Only the default
append
mode is tested here; consider adding a test formode='overwrite'
to validate that option.
engine.save_checks_in_table(TEST_CHECKS, table_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Fix spark remote version detection in CI [#342](#342) * Fix spark remote installation [#346](#346) * Load and save checks from a Delta table [#339](#339) * Handle nulls in uniqueness check for composite keys [(#345)](#345) * Allow user metadata for individual checks [#352](#352) * Add functionality to save results in delta table [#319](#319) * Fix "older than" checks [#354](#354) * Add PII-detection example [#358](#358) * Add aggregation type of checks [#357](#357)
Changes
Added the following methods to the
DQEngine
class to save and load checks to a Delta table in a Databricks workspace:save_checks_in_dataframe
load_checks_from_dataframe
save_checks_in_table
load_checks_from_table
Linked issues
Resolves #299
Tests
Added unit and integration tests.