Skip to content

Commit ddfcbf3

Browse files
committed
Fix message for invalid geometries
1 parent 118ae92 commit ddfcbf3

File tree

7 files changed

+627
-2
lines changed

7 files changed

+627
-2
lines changed

docs/dqx/docs/reference/benchmarks.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,11 @@ sidebar_position: 13
5151
| test_benchmark_foreach_sql_expression[n_rows_100000000_n_columns_5] | 0.895489 | 0.888982 | 0.853895 | 0.950998 | 0.041479 | 0.071722 | 0.858589 | 0.930311 | 5 | 0 | 2 | 1.12 |
5252
| test_benchmark_foreach_sql_query[n_rows_100000000_n_columns_5] | 4.578799 | 4.602143 | 4.442396 | 4.644892 | 0.083901 | 0.113694 | 4.530776 | 4.644470 | 5 | 0 | 1 | 0.22 |
5353
| test_benchmark_foreign_key | 31.784272 | 31.787610 | 31.414708 | 32.123221 | 0.269713 | 0.386951 | 31.597198 | 31.984149 | 5 | 0 | 2 | 0.03 |
54+
| test_benchmark_has_area_greater_than | 0.171868 | 0.166867 | 0.161877 | 0.204155 | 0.015957 | 0.003194 | 0.164123 | 0.167316 | 6 | 1 | 1 | 5.82 |
55+
| test_benchmark_has_area_less_than | 0.177230 | 0.179352 | 0.161536 | 0.190875 | 0.010356 | 0.013261 | 0.169503 | 0.182763 | 6 | 0 | 2 | 5.64 |
5456
| test_benchmark_has_dimension | 0.215338 | 0.213285 | 0.210530 | 0.223131 | 0.005056 | 0.007086 | 0.211819 | 0.218905 | 5 | 0 | 1 | 4.64 |
57+
| test_benchmark_has_num_points_greater_than | 0.159204 | 0.157405 | 0.151457 | 0.175503 | 0.008775 | 0.008935 | 0.152260 | 0.161195 | 6 | 1 | 1 | 6.28 |
58+
| test_benchmark_has_num_points_less_than | 0.162069 | 0.161908 | 0.149400 | 0.178192 | 0.010833 | 0.014197 | 0.154168 | 0.168365 | 5 | 0 | 2 | 6.17 |
5559
| test_benchmark_has_valid_schema | 0.172078 | 0.172141 | 0.163793 | 0.181081 | 0.006715 | 0.009295 | 0.167010 | 0.176305 | 6 | 0 | 2 | 5.81 |
5660
| test_benchmark_has_x_coordinate_between | 0.217192 | 0.213656 | 0.209310 | 0.236233 | 0.011150 | 0.012638 | 0.209410 | 0.222048 | 5 | 0 | 1 | 4.60 |
5761
| test_benchmark_has_y_coordinate_between | 0.218497 | 0.219630 | 0.209352 | 0.234111 | 0.010103 | 0.013743 | 0.209584 | 0.223327 | 5 | 0 | 1 | 4.58 |

docs/dqx/docs/reference/quality_checks.mdx

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,11 @@ You can also define your own custom checks (see [Creating custom checks](#creati
6666
| `is_non_empty_geometry` | Checks whether the values in the input column are non-empty geometries. This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression) |
6767
| `has_dimension` | Checks whether the values in the input column are geometries of the specified dimension (2D projected dimension). This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression); `dimension`: dimension to check |
6868
| `has_x_coordinate_between` | Checks whether the values in the input column are geometries with x coordinate between the provided boundaries. This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression); `min_value`: minimum value; `max_value`: maximum value |
69-
| `has_y_coordinate_between` | Checks whether the values in the input column are geometries with y coordinate between the provided boundaries. This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression); `min_value`: minimum value; `max_value`: maximum value | | `column`: column to check (can be a string column name or a column expression); `min_value`: minimum value; `max_value`: maximum value |
69+
| `has_y_coordinate_between` | Checks whether the values in the input column are geometries with y coordinate between the provided boundaries. This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression); `min_value`: minimum value; `max_value`: maximum value |
70+
| `has_area_greater_than` | Checks whether the values in the input column are geometries with area greater than the specified limit. This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression); `limit`: area limit to compare against (can be a number, column name, or SQL expression) |
71+
| `has_area_less_than` | Checks whether the values in the input column are geometries with area less than the specified limit. This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression); `limit`: area limit to compare against (can be a number, column name, or SQL expression) |
72+
| `has_num_points_greater_than` | Checks whether the values in the input column are geometries with number of coordinate pairs greater than the specified limit. This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression); `limit`: number of points limit to compare against (can be a number, column name, or SQL expression) |
73+
| `has_num_points_less_than` | Checks whether the values in the input column are geometries with number of coordinate pairs less than the specified limit. This function requires Databricks serverless compute or runtime >= 17.1. | `column`: column to check (can be a string column name or a column expression); `limit`: number of points limit to compare against (can be a number, column name, or SQL expression) |
7074
</details>
7175

7276
<Admonition type="warning" title="Applicability">
@@ -607,6 +611,38 @@ For brevity, the `name` field in the examples is omitted and it will be auto-gen
607611
column: polygon_geom
608612
min_value: 0.0
609613
max_value: 10.0
614+
615+
# has_area_greater_than check
616+
- criticality: error
617+
check:
618+
function: has_area_greater_than
619+
arguments:
620+
column: polygon_geom
621+
limit: 100.0
622+
623+
# has_area_less_than check
624+
- criticality: error
625+
check:
626+
function: has_area_less_than
627+
arguments:
628+
column: polygon_geom
629+
limit: 0.1
630+
631+
# has_num_points_greater_than check
632+
- criticality: error
633+
check:
634+
function: has_num_points_greater_than
635+
arguments:
636+
column: polygon_geom
637+
limit: 10
638+
639+
# has_num_points_less_than check
640+
- criticality: error
641+
check:
642+
function: has_num_points_less_than
643+
arguments:
644+
column: polygon_geom
645+
limit: 3
610646
```
611647
</details>
612648
@@ -1069,6 +1105,38 @@ checks = [
10691105
check_func_kwargs={"min_value": 0.0, "max_value": 10.0}
10701106
),
10711107

1108+
# has_area_greater_than check
1109+
DQRowRule(
1110+
criticality="error",
1111+
check_func=geo_check_funcs.has_area_greater_than,
1112+
column="polygon_geom",
1113+
check_func_kwargs={"limit": 100.0}
1114+
),
1115+
1116+
# has_area_less_than check
1117+
DQRowRule(
1118+
criticality="error",
1119+
check_func=geo_check_funcs.has_area_less_than,
1120+
column="polygon_geom",
1121+
check_func_kwargs={"limit": 0.1}
1122+
),
1123+
1124+
# has_num_points_greater_than check
1125+
DQRowRule(
1126+
criticality="error",
1127+
check_func=geo_check_funcs.has_num_points_greater_than,
1128+
column="polygon_geom",
1129+
check_func_kwargs={"limit": 10}
1130+
),
1131+
1132+
# has_num_points_less_than check
1133+
DQRowRule(
1134+
criticality="error",
1135+
check_func=geo_check_funcs.has_num_points_less_than,
1136+
column="polygon_geom",
1137+
check_func_kwargs={"limit": 3}
1138+
),
1139+
10721140
# sql_expression check
10731141
DQRowRule(
10741142
criticality="error",

0 commit comments

Comments
 (0)