[BUGFIX] ExpectColumnValuesToBeOfType returns wrong result format#11855
[BUGFIX] ExpectColumnValuesToBeOfType returns wrong result format#11855joshua-stauffer wants to merge 14 commits into
Conversation
…egate backends
ExpectColumnValuesToBeOfType is a ColumnMapExpectation, but on Spark,
SqlAlchemy, and the non-object Pandas branch its result only contained
{"observed_value": ...} instead of the standard map-result fields
documented for the public API (element_count, unexpected_count,
missing_count, unexpected_percent_total, etc.). This made result
shapes inconsistent across backends and broke downstream consumers
relying on the documented format.
The fix routes the aggregate-style result returned by _validate_pandas
/_validate_sqlalchemy/_validate_spark through _format_map_output using
table.row_count and the column null count, while preserving the
existing observed_value field. For success the entire column passes
(unexpected_count=0); on failure every non-null row is unexpected
(unexpected_count=nonnull_count). result.success behavior is
unchanged.
Closes #11076
✅ Deploy Preview for niobium-lead-7998 canceled.
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
Fixes ExpectColumnValuesToBeOfType to return standard ColumnMapExpectation-style result fields (e.g., element_count, unexpected_count, missing_count, etc.) on non-Pandas-object backends (Spark, SqlAlchemy, and non-object Pandas dtype aggregate path), aligning the output with the documented expectation contract and addressing #11076.
Changes:
- Adds
table.row_countandcolumn_values.nonnull.unexpected_countto validation dependencies so aggregate backends can populate map-style summary fields. - Introduces
_build_map_resultto run aggregate validation outcomes through_format_map_outputfor consistent result shaping across backends. - Adds a Spark integration regression test asserting map-result fields are present for
ResultFormat.SUMMARY.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
great_expectations/expectations/core/expect_column_values_to_be_of_type.py |
Adds validation dependencies and formats aggregate backend results via _format_map_output to provide standard map-result fields. |
tests/integration/data_sources_and_expectations/expectations/test_expect_column_values_to_be_of_type.py |
Adds Spark integration test to ensure the result includes standard map fields (regression for #11076). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #11855 +/- ##
========================================
Coverage 84.79% 84.80%
========================================
Files 471 471
Lines 39169 39200 +31
========================================
+ Hits 33215 33242 +27
- Misses 5954 5958 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…obeoftype-returns-wro
There was a problem hiding this comment.
Pull request overview
Fixes ExpectColumnValuesToBeOfType result-shape inconsistency across backends by ensuring non-Pandas-object (Spark/SqlAlchemy/non-object Pandas) validation paths return standard ColumnMapExpectation map-result fields (element/missing/unexpected counts & percents), aligning with documented expectation output.
Changes:
- Add
table.row_countandcolumn_values.nonnull.unexpected_countvalidation dependencies for the aggregate-style validation paths. - Introduce
_build_map_resultto format aggregate validation output via_format_map_output(while preservingobserved_value). - Add a Spark integration test asserting presence of standard map-result fields (regression for #11076).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| great_expectations/expectations/core/expect_column_values_to_be_of_type.py | Adds dependencies + formats aggregate backend results through _format_map_output for consistent map-result fields. |
| tests/integration/data_sources_and_expectations/expectations/test_expect_column_values_to_be_of_type.py | Adds Spark regression test ensuring result contains standard map fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…obeoftype-returns-wro
_build_map_result was unconditionally attaching observed_value to
formatted["result"], which re-introduced a result dict for BOOLEAN_ONLY
where _format_map_output intentionally omits one. Gate the attach on the
parsed result_format and add a Spark regression test asserting
result.result == {} for BOOLEAN_ONLY.
There was a problem hiding this comment.
Pull request overview
Fixes ExpectColumnValuesToBeOfType so that non-Pandas-object backends (Spark/SQLAlchemy and aggregate-style Pandas paths) return the standard ColumnMapExpectation result shape (element/unexpected/missing counts & percents, etc.) instead of only {"observed_value": ...}, aligning runtime behavior with the documented contract.
Changes:
- Add
table.row_countandcolumn_values.nonnull.unexpected_countvalidation dependencies and introduce_build_map_result()to format aggregate-path results via_format_map_output. - Update
_validate()to return formatted map-style results for Spark/SQLAlchemy/aggregate Pandas paths, while preserving the existing Pandas-object map path. - Add Spark integration tests asserting standard map fields are present for
SUMMARYand thatBOOLEAN_ONLYkeepsresult == {}.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| great_expectations/expectations/core/expect_column_values_to_be_of_type.py | Adds aggregate-path dependencies and a helper to format aggregate validation output into standard map-result fields across backends. |
| tests/integration/data_sources_and_expectations/expectations/test_expect_column_values_to_be_of_type.py | Adds Spark regression tests covering standard map-result fields and BOOLEAN_ONLY behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…obeoftype-returns-wro
…obeoftype-returns-wro
There was a problem hiding this comment.
Pull request overview
Fixes ExpectColumnValuesToBeOfType validation result shape on non-Pandas-object backends so it consistently returns standard ColumnMapExpectation map-result fields (e.g., element_count, unexpected_count, missing_count, etc.), addressing reported issue #11076.
Changes:
- Declares additional validation dependencies (
table.row_count,column_values.nonnull.unexpected_count) and introduces_build_map_resultto format aggregate-backend outputs via_format_map_output. - Updates
_validateto wrap aggregate-backend validation outputs with consistent map-result formatting (while preserving the Pandas-object map path). - Adds a Spark integration test reproducing the missing-fields bug and a regression test for
BOOLEAN_ONLYresult format.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| great_expectations/expectations/core/expect_column_values_to_be_of_type.py | Adds required metrics and formats aggregate-backend results into standard map-result shape via _format_map_output. |
| tests/integration/data_sources_and_expectations/expectations/test_expect_column_values_to_be_of_type.py | Adds Spark integration coverage for the corrected result format and BOOLEAN_ONLY contract. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…obeoftype-returns-wro
…obeoftype-returns-wro
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| validation_dependencies.set_metric_configuration( | ||
| metric_name="table.row_count", | ||
| metric_configuration=MetricConfiguration( | ||
| metric_name="table.row_count", | ||
| metric_domain_kwargs=row_count_metric_kwargs["metric_domain_kwargs"], |
…obeoftype-returns-wro
Replace the Spark presence-only assertion with a shared helper that verifies the full set of ColumnMapExpectation result fields with concrete values across each aggregate-path backend (non-object Pandas, PostgreSQL, Spark). Covers both success and failure cases, plus columns containing nulls so missing_count is exercised.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Issue #11076: ExpectColumnValuesToBeOfType is a ColumnMapExpectation, so its | ||
| # result should include the standard map-result fields (element_count, | ||
| # unexpected_count, missing_count, etc.) regardless of backend. The aggregate | ||
| # paths (Spark, SqlAlchemy, non-object Pandas) need table.row_count and the | ||
| # column null count to populate those fields via _format_map_output. | ||
| row_count_metric_kwargs = get_metric_kwargs( | ||
| metric_name="table.row_count", | ||
| configuration=configuration, | ||
| runtime_configuration=runtime_configuration, | ||
| ) | ||
| validation_dependencies.set_metric_configuration( | ||
| metric_name="table.row_count", | ||
| metric_configuration=MetricConfiguration( | ||
| metric_name="table.row_count", | ||
| metric_domain_kwargs=row_count_metric_kwargs["metric_domain_kwargs"], | ||
| metric_value_kwargs=row_count_metric_kwargs["metric_value_kwargs"], | ||
| ), | ||
| ) | ||
|
|
||
| nonnull_unexpected_count_metric_name = ( | ||
| f"column_values.nonnull.{SummarizationMetricNameSuffixes.UNEXPECTED_COUNT.value}" | ||
| ) | ||
| nonnull_metric_kwargs = get_metric_kwargs( | ||
| metric_name=nonnull_unexpected_count_metric_name, | ||
| configuration=configuration, | ||
| runtime_configuration=runtime_configuration, | ||
| ) | ||
| validation_dependencies.set_metric_configuration( | ||
| metric_name=nonnull_unexpected_count_metric_name, | ||
| metric_configuration=MetricConfiguration( | ||
| metric_name=nonnull_unexpected_count_metric_name, | ||
| metric_domain_kwargs=nonnull_metric_kwargs["metric_domain_kwargs"], | ||
| metric_value_kwargs=nonnull_metric_kwargs["metric_value_kwargs"], | ||
| ), | ||
| ) |
Closes #11076
Summary
ExpectColumnValuesToBeOfTypeis aColumnMapExpectation, but on non-Pandas-object backends (Spark, SqlAlchemy, and non-object Pandas dtype paths) its_validatemethod returned only{"observed_value": <type>}instead of the standard map-result fields (element_count,unexpected_count,unexpected_percent,partial_unexpected_list,missing_count, etc.). This fix addstable.row_countandcolumn_values.nonnull.unexpected_countas validation dependencies and introduces a_build_map_resulthelper that calls_format_map_output, giving callers a consistent result shape regardless of backend.Changes
expect_column_values_to_be_of_type.py: overridesget_validation_dependenciesto declaretable.row_countandcolumn_values.nonnull.unexpected_countas required metrics for all backends; adds_build_map_resulthelper; updates_validateto run aggregate results through_build_map_resultinstead of returning them directly.test_expect_column_values_to_be_of_type.py: addstest_result_format_contains_map_fields_on_sparkreproducing issue ExpectColumnValuesToBeOfType dq rule is not returning expected result format #11076 and asserting all standard map-result fields are present.Why
Community reporters found that validating column types on Spark/Databricks produced a result dict with only
observed_value, missing all the map-expectation fields documented at greatexpectations.io/expectations/expect_column_values_to_be_of_type. Downstream tooling and result-format consumers broke because the fields they expected were absent.User impact
Users running
ExpectColumnValuesToBeOfTypeon Spark, Databricks, or SqlAlchemy backends will now receive the full set of standardColumnMapExpectationresult fields in the validation result, matching the documented contract.How to review
test_result_format_contains_map_fields_on_spark) directly reproduces the reported failure and passes after the fix.unexpected_count=0; on failure every non-null row is counted as unexpected (type either matches or it doesn't — there are no partial matches).super()._validate(...)) is unchanged and still goes through the existingColumnMapMetric._validatepipeline.Test plan
test_result_format_contains_map_fields_on_sparkpasses on Spark backendExpectColumnValuesToBeOfTypetests continue to pass on Pandas, SqlAlchemy, and Spark backendsresult["result"]containselement_count,unexpected_count,unexpected_percent,partial_unexpected_list,missing_count,missing_percent,unexpected_percent_total,unexpected_percent_nonmissingfor all non-Pandas-object backends