Skip to content

Commit a22ab79

Browse files
authored
Release v0.9.2 (#567)
* Added performance benchmarks ([#548](#548)). Performance tests are run to ensure performance does not degrade by more than 25% by any change. Benchmark results are published in the documentation in the reference section. The benchmark covers all check functions, running all funcitons at once and applying the same funcitons at once for multiple columns using foreach column. A new performance GitHub workflow has been introduced to automate performance benchmarking, generating a new benchmark baseline, updating the existing baseline, and running performance tests to compare with the baseline. * Declare readme in the project ([#547](#547)). The project configuration has been updated to include README file in the released package so that it is visible in PyPi. * Fixed deserializing to DataFrame to assign columns properly ([#559](#559)). The `deserialize_checks_to_dataframe` function has been enhanced to correctly handle columns for `sql_expression` by removing the unnecessary check for `DQDatasetRule` instance and directly verifying if `dq_rule_check.columns` is not `None`. * Fixed lsql dependency ([#564](#564)). The lsql dependency has been updated to address a sqlglot dependency issue that arises when imported in artifacts repositories.
1 parent 9ce260e commit a22ab79

File tree

7 files changed

+22
-19
lines changed

7 files changed

+22
-19
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Version changelog
22

3+
## 0.9.2
4+
5+
* Added performance benchmarks ([#548](https://github.com/databrickslabs/dqx/issues/548)). Performance tests are run to ensure performance does not degrade by more than 25% by any change. Benchmark results are published in the documentation in the reference section. The benchmark covers all check functions, running all funcitons at once and applying the same funcitons at once for multiple columns using foreach column. A new performance GitHub workflow has been introduced to automate performance benchmarking, generating a new benchmark baseline, updating the existing baseline, and running performance tests to compare with the baseline.
6+
* Declare readme in the project ([#547](https://github.com/databrickslabs/dqx/issues/547)). The project configuration has been updated to include README file in the released package so that it is visible in PyPi.
7+
* Fixed deserializing to DataFrame to assign columns properly ([#559](https://github.com/databrickslabs/dqx/issues/559)). The `deserialize_checks_to_dataframe` function has been enhanced to correctly handle columns for `sql_expression` by removing the unnecessary check for `DQDatasetRule` instance and directly verifying if `dq_rule_check.columns` is not `None`.
8+
* Fixed lsql dependency ([#564](https://github.com/databrickslabs/dqx/issues/564)). The lsql dependency has been updated to address a sqlglot dependency issue that arises when imported in artifacts repositories.
9+
310
## 0.9.1
411

512
* Added quality checker and end to end workflows ([#519](https://github.com/databrickslabs/dqx/issues/519)). This release introduces no-code solution for applying checks. The following workflows were added: quality-checker (apply checks and save results to tables) and end-to-end (e2e) workflows (profile input data, generate quality checks, apply the checks, save results to tables). The workflows enable quality checking for data at-rest without the need for code-level integration. It supports reference data for checks using tables (e.g., required by foreign key or compare datasets checks) as well as custom python check functions (mapping of custom check funciton to the module path in the workspace or Unity Catalog volume containing the function definition). The workflows handle one run config for each job run. Future release will introduce functionality to execute this across multiple tables. In addition, CLI commands have been added to execute the workflows. Additionaly, DQX workflows are configured now to execute using serverless clusters, with an option to use standards clusters as well. InstallationChecksStorageHandler now support absolute workspace path locations.

docs/dqx/docs/demos.mdx

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,20 @@ import Admonition from '@theme/Admonition';
88

99
Import the following notebooks in the Databricks workspace to try DQX out:
1010
## Use as Library
11-
* [DQX Quick Start Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_quick_start_demo_library.py) - quickstart on how to use DQX as a library.
12-
* [DQX Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_library.py) - demonstrates how to use DQX as a library.
13-
* [DQX Demo Notebook for Spark Structured Streaming (Native End-to-End Approach)](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_streaming_demo_native.py) - demonstrates how to use DQX as a library with Spark Structured Streaming, using the built-in end-to-end method to handle both reading and writing.
14-
* [DQX Demo Notebook for Spark Structured Streaming (DIY Approach)](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_streaming_demo_diy.py) - demonstrates how to use DQX as a library with Spark Structured Streaming, while handling reading and writing on your own outside DQX using Spark API.
15-
* [DQX Demo Notebook for Lakeflow Pipelines (formerly DLT)](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_dlt_demo.py) - demonstrates how to use DQX as a library with Lakeflow Pipelines.
16-
* [DQX Asset Bundles Demo](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_asset_bundle/README.md) - demonstrates how to use DQX as a library with Databricks Asset Bundles.
17-
* [DQX Demo for dbt](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_dbt/README.md) - demonstrates how to use DQX as a library with dbt projects.
11+
* [DQX Quick Start Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_quick_start_demo_library.py) - quickstart on how to use DQX as a library.
12+
* [DQX Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_library.py) - demonstrates how to use DQX as a library.
13+
* [DQX Demo Notebook for Spark Structured Streaming (Native End-to-End Approach)](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_streaming_demo_native.py) - demonstrates how to use DQX as a library with Spark Structured Streaming, using the built-in end-to-end method to handle both reading and writing.
14+
* [DQX Demo Notebook for Spark Structured Streaming (DIY Approach)](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_streaming_demo_diy.py) - demonstrates how to use DQX as a library with Spark Structured Streaming, while handling reading and writing on your own outside DQX using Spark API.
15+
* [DQX Demo Notebook for Lakeflow Pipelines (formerly DLT)](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_dlt_demo.py) - demonstrates how to use DQX as a library with Lakeflow Pipelines.
16+
* [DQX Asset Bundles Demo](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_asset_bundle/README.md) - demonstrates how to use DQX as a library with Databricks Asset Bundles.
17+
* [DQX Demo for dbt](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_dbt/README.md) - demonstrates how to use DQX as a library with dbt projects.
1818

1919
## Deploy as Workspace Tool
20-
* [DQX Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_tool.py) - demonstrates how to use DQX as a tool when installed in the workspace.
20+
* [DQX Demo Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_tool.py) - demonstrates how to use DQX as a tool when installed in the workspace.
2121

2222
## Use Cases
23-
* [DQX for PII Detection Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_pii_detection.py) - demonstrates how to use DQX to check data for Personally Identifiable Information (PII).
24-
* [DQX for Manufacturing Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_manufacturing_demo.py) - demonstrates how to use DQX to check data quality for Manufacturing Industry datasets.
23+
* [DQX for PII Detection Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_pii_detection.py) - demonstrates how to use DQX to check data for Personally Identifiable Information (PII).
24+
* [DQX for Manufacturing Notebook](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_manufacturing_demo.py) - demonstrates how to use DQX to check data quality for Manufacturing Industry datasets.
2525

2626
<br />
2727
<Admonition type="tip" title="Execution Environment">

docs/dqx/docs/guide/quality_checks_apply.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -559,7 +559,7 @@ Below is a sample output of a check stored in a result column (error or warning)
559559
```
560560

561561
The structure of the result columns is an array of struct containing the following fields
562-
(see the exact structure [here](https://github.com/databrickslabs/dqx/blob/v0.9.1/src/databricks/labs/dqx/schema/dq_result_schema.py)):
562+
(see the exact structure [here](https://github.com/databrickslabs/dqx/blob/v0.9.2/src/databricks/labs/dqx/schema/dq_result_schema.py)):
563563
- `name`: name of the check (string type).
564564
- `message`: message describing the quality issue (string type).
565565
- `columns`: name of the column(s) where the quality issue was found (string type).

docs/dqx/docs/reference/benchmarks.mdx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,4 @@ sidebar_position: 13
116116
| test_benchmark_sql_expression[col5] | 0.259120 | 0.263383 | 0.224057 | 0.295827 | 0.029468 | 0.049067 | 0.232577 | 0.281643 | 5 | 0 | 2 | 3.86 |
117117
| test_benchmark_sql_expression[col6] | 0.242065 | 0.240533 | 0.230321 | 0.255079 | 0.008982 | 0.010016 | 0.237314 | 0.247331 | 5 | 0 | 2 | 4.13 |
118118
| test_benchmark_sql_expression[col9] | 0.286430 | 0.291049 | 0.237969 | 0.345171 | 0.046762 | 0.083079 | 0.240589 | 0.323668 | 5 | 0 | 2 | 3.49 |
119-
| test_benchmark_sql_query | 0.279797 | 0.274901 | 0.244968 | 0.333088 | 0.035766 | 0.054686 | 0.249674 | 0.304360 | 5 | 0 | 1 | 3.57 |
120-
121-
See the tests implementation [here](https://github.com/databrickslabs/dqx/blob/v0.9.1/tests/perf/).
119+
| test_benchmark_sql_query | 0.279797 | 0.274901 | 0.244968 | 0.333088 | 0.035766 | 0.054686 | 0.249674 | 0.304360 | 5 | 0 | 1 | 3.57 |

docs/dqx/docs/reference/quality_checks.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem';
1111
This page provides a reference for the quality check functions available in DQX.
1212
The terms `checks` and `rules` are used interchangeably in the documentation, as they are synonymous in DQX.
1313

14-
You can explore the implementation details of the check functions [here](https://github.com/databrickslabs/dqx/blob/v0.9.1/src/databricks/labs/dqx/check_funcs.py).
14+
You can explore the implementation details of the check functions [here](https://github.com/databrickslabs/dqx/blob/v0.9.2/src/databricks/labs/dqx/check_funcs.py).
1515

1616
## Row-level checks reference
1717

@@ -2723,7 +2723,7 @@ The built-in check supports several configurable parameters:
27232723
DQX automatically installs the built-in entity recognition models at runtime if they are not already available.
27242724
However, for better performance and to avoid potential out-of-memory issues, it is recommended to pre-install models using pip install.
27252725
Any additional models used in a custom configuration must also be installed on your Databricks cluster.
2726-
See the [Using DQX for PII Detection](https://github.com/databrickslabs/dqx/blob/v0.9.1/demos/dqx_demo_pii_detection.py) notebook for examples of custom model installation.
2726+
See the [Using DQX for PII Detection](https://github.com/databrickslabs/dqx/blob/v0.9.2/demos/dqx_demo_pii_detection.py) notebook for examples of custom model installation.
27272727
</Admonition>
27282728

27292729
#### Using Built-in NLP Engine Configurations
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.9.1"
1+
__version__ = "0.9.2"

tests/perf/generate_md_report.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,6 @@
6161
f"| {stats['ops']:.2f} |"
6262
)
6363

64-
lines.append("\nSee the tests implementation [here](https://github.com/databrickslabs/dqx/blob/v0.9.1/tests/perf/).\n")
65-
6664
# overwrite the report
6765
report_path.write_text("\n".join(lines))
6866
print(f"REPORT_PATH={report_path.resolve()}")

0 commit comments

Comments
 (0)