You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Added performance benchmarks
([#548](#548)). Performance
tests are run to ensure performance does not degrade by more than 25% by
any change. Benchmark results are published in the documentation in the
reference section. The benchmark covers all check functions, running all
funcitons at once and applying the same funcitons at once for multiple
columns using foreach column. A new performance GitHub workflow has been
introduced to automate performance benchmarking, generating a new
benchmark baseline, updating the existing baseline, and running
performance tests to compare with the baseline.
* Declare readme in the project
([#547](#547)). The project
configuration has been updated to include README file in the released
package so that it is visible in PyPi.
* Fixed deserializing to DataFrame to assign columns properly
([#559](#559)). The
`deserialize_checks_to_dataframe` function has been enhanced to
correctly handle columns for `sql_expression` by removing the
unnecessary check for `DQDatasetRule` instance and directly verifying if
`dq_rule_check.columns` is not `None`.
* Fixed lsql dependency
([#564](#564)). The lsql
dependency has been updated to address a sqlglot dependency issue that
arises when imported in artifacts repositories.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,12 @@
1
1
# Version changelog
2
2
3
+
## 0.9.2
4
+
5
+
* Added performance benchmarks ([#548](https://github.com/databrickslabs/dqx/issues/548)). Performance tests are run to ensure performance does not degrade by more than 25% by any change. Benchmark results are published in the documentation in the reference section. The benchmark covers all check functions, running all funcitons at once and applying the same funcitons at once for multiple columns using foreach column. A new performance GitHub workflow has been introduced to automate performance benchmarking, generating a new benchmark baseline, updating the existing baseline, and running performance tests to compare with the baseline.
6
+
* Declare readme in the project ([#547](https://github.com/databrickslabs/dqx/issues/547)). The project configuration has been updated to include README file in the released package so that it is visible in PyPi.
7
+
* Fixed deserializing to DataFrame to assign columns properly ([#559](https://github.com/databrickslabs/dqx/issues/559)). The `deserialize_checks_to_dataframe` function has been enhanced to correctly handle columns for `sql_expression` by removing the unnecessary check for `DQDatasetRule` instance and directly verifying if `dq_rule_check.columns` is not `None`.
8
+
* Fixed lsql dependency ([#564](https://github.com/databrickslabs/dqx/issues/564)). The lsql dependency has been updated to address a sqlglot dependency issue that arises when imported in artifacts repositories.
9
+
3
10
## 0.9.1
4
11
5
12
* Added quality checker and end to end workflows ([#519](https://github.com/databrickslabs/dqx/issues/519)). This release introduces no-code solution for applying checks. The following workflows were added: quality-checker (apply checks and save results to tables) and end-to-end (e2e) workflows (profile input data, generate quality checks, apply the checks, save results to tables). The workflows enable quality checking for data at-rest without the need for code-level integration. It supports reference data for checks using tables (e.g., required by foreign key or compare datasets checks) as well as custom python check functions (mapping of custom check funciton to the module path in the workspace or Unity Catalog volume containing the function definition). The workflows handle one run config for each job run. Future release will introduce functionality to execute this across multiple tables. In addition, CLI commands have been added to execute the workflows. Additionaly, DQX workflows are configured now to execute using serverless clusters, with an option to use standards clusters as well. InstallationChecksStorageHandler now support absolute workspace path locations.
*[DQX Quick Start Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_quick_start_demo_library.py) - quickstart on how to use DQX as a library.
12
-
*[DQX Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_library.py) - demonstrates how to use DQX as a library.
13
-
*[DQX Demo Notebook for Spark Structured Streaming (Native End-to-End Approach)](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_streaming_demo_native.py) - demonstrates how to use DQX as a library with Spark Structured Streaming, using the built-in end-to-end method to handle both reading and writing.
14
-
*[DQX Demo Notebook for Spark Structured Streaming (DIY Approach)](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_streaming_demo_diy.py) - demonstrates how to use DQX as a library with Spark Structured Streaming, while handling reading and writing on your own outside DQX using Spark API.
15
-
*[DQX Demo Notebook for Lakeflow Pipelines (formerly DLT)](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_dlt_demo.py) - demonstrates how to use DQX as a library with Lakeflow Pipelines.
16
-
*[DQX Asset Bundles Demo](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_asset_bundle/README.md) - demonstrates how to use DQX as a library with Databricks Asset Bundles.
17
-
*[DQX Demo for dbt](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_dbt/README.md) - demonstrates how to use DQX as a library with dbt projects.
11
+
*[DQX Quick Start Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_quick_start_demo_library.py) - quickstart on how to use DQX as a library.
12
+
*[DQX Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_library.py) - demonstrates how to use DQX as a library.
13
+
*[DQX Demo Notebook for Spark Structured Streaming (Native End-to-End Approach)](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_streaming_demo_native.py) - demonstrates how to use DQX as a library with Spark Structured Streaming, using the built-in end-to-end method to handle both reading and writing.
14
+
*[DQX Demo Notebook for Spark Structured Streaming (DIY Approach)](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_streaming_demo_diy.py) - demonstrates how to use DQX as a library with Spark Structured Streaming, while handling reading and writing on your own outside DQX using Spark API.
15
+
*[DQX Demo Notebook for Lakeflow Pipelines (formerly DLT)](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_dlt_demo.py) - demonstrates how to use DQX as a library with Lakeflow Pipelines.
16
+
*[DQX Asset Bundles Demo](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_asset_bundle/README.md) - demonstrates how to use DQX as a library with Databricks Asset Bundles.
17
+
*[DQX Demo for dbt](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_dbt/README.md) - demonstrates how to use DQX as a library with dbt projects.
18
18
19
19
## Deploy as Workspace Tool
20
-
*[DQX Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_tool.py) - demonstrates how to use DQX as a tool when installed in the workspace.
20
+
*[DQX Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_tool.py) - demonstrates how to use DQX as a tool when installed in the workspace.
21
21
22
22
## Use Cases
23
-
*[DQX for PII Detection Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_pii_detection.py) - demonstrates how to use DQX to check data for Personally Identifiable Information (PII).
24
-
*[DQX for Manufacturing Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_manufacturing_demo.py) - demonstrates how to use DQX to check data quality for Manufacturing Industry datasets.
23
+
*[DQX for PII Detection Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_pii_detection.py) - demonstrates how to use DQX to check data for Personally Identifiable Information (PII).
24
+
*[DQX for Manufacturing Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_manufacturing_demo.py) - demonstrates how to use DQX to check data quality for Manufacturing Industry datasets.
Copy file name to clipboardExpand all lines: docs/dqx/docs/reference/quality_checks.mdx
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem';
11
11
This page provides a reference for the quality check functions available in DQX.
12
12
The terms `checks` and `rules` are used interchangeably in the documentation, as they are synonymous in DQX.
13
13
14
-
You can explore the implementation details of the check functions [here](https://github.com/databrickslabs/dqx/blob/v0.9.1/src/databricks/labs/dqx/check_funcs.py).
14
+
You can explore the implementation details of the check functions [here](https://github.com/databrickslabs/dqx/blob/v0.9.2/src/databricks/labs/dqx/check_funcs.py).
15
15
16
16
## Row-level checks reference
17
17
@@ -2723,7 +2723,7 @@ The built-in check supports several configurable parameters:
2723
2723
DQX automatically installs the built-in entity recognition models at runtime if they are not already available.
2724
2724
However, for better performance and to avoid potential out-of-memory issues, it is recommended to pre-install models using pip install.
2725
2725
Any additional models used in a custom configuration must also be installed on your Databricks cluster.
2726
-
See the [Using DQX for PII Detection](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_pii_detection.py) notebook for examples of custom model installation.
2726
+
See the [Using DQX for PII Detection](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_pii_detection.py) notebook for examples of custom model installation.
0 commit comments