Skip to content

[BUGFIX] ExpectColumnValuesToBeOfType returns wrong result format#11855

Open
joshua-stauffer wants to merge 14 commits into
developfrom
community-issue-11076-expectcolumnvaluestobeoftype-returns-wro
Open

[BUGFIX] ExpectColumnValuesToBeOfType returns wrong result format#11855
joshua-stauffer wants to merge 14 commits into
developfrom
community-issue-11076-expectcolumnvaluestobeoftype-returns-wro

Conversation

@joshua-stauffer
Copy link
Copy Markdown
Member

Closes #11076

Summary

ExpectColumnValuesToBeOfType is a ColumnMapExpectation, but on non-Pandas-object backends (Spark, SqlAlchemy, and non-object Pandas dtype paths) its _validate method returned only {"observed_value": <type>} instead of the standard map-result fields (element_count, unexpected_count, unexpected_percent, partial_unexpected_list, missing_count, etc.). This fix adds table.row_count and column_values.nonnull.unexpected_count as validation dependencies and introduces a _build_map_result helper that calls _format_map_output, giving callers a consistent result shape regardless of backend.

Changes

  • expect_column_values_to_be_of_type.py: overrides get_validation_dependencies to declare table.row_count and column_values.nonnull.unexpected_count as required metrics for all backends; adds _build_map_result helper; updates _validate to run aggregate results through _build_map_result instead of returning them directly.
  • test_expect_column_values_to_be_of_type.py: adds test_result_format_contains_map_fields_on_spark reproducing issue ExpectColumnValuesToBeOfType dq rule is not returning expected result format #11076 and asserting all standard map-result fields are present.

Why

Community reporters found that validating column types on Spark/Databricks produced a result dict with only observed_value, missing all the map-expectation fields documented at greatexpectations.io/expectations/expect_column_values_to_be_of_type. Downstream tooling and result-format consumers broke because the fields they expected were absent.

User impact

Users running ExpectColumnValuesToBeOfType on Spark, Databricks, or SqlAlchemy backends will now receive the full set of standard ColumnMapExpectation result fields in the validation result, matching the documented contract.

How to review

  • The new integration test (test_result_format_contains_map_fields_on_spark) directly reproduces the reported failure and passes after the fix.
  • The aggregate type-check semantics are preserved: on success unexpected_count=0; on failure every non-null row is counted as unexpected (type either matches or it doesn't — there are no partial matches).
  • The Pandas-object path (super()._validate(...)) is unchanged and still goes through the existing ColumnMapMetric._validate pipeline.

Test plan

  • test_result_format_contains_map_fields_on_spark passes on Spark backend
  • Existing ExpectColumnValuesToBeOfType tests continue to pass on Pandas, SqlAlchemy, and Spark backends
  • result["result"] contains element_count, unexpected_count, unexpected_percent, partial_unexpected_list, missing_count, missing_percent, unexpected_percent_total, unexpected_percent_nonmissing for all non-Pandas-object backends

…egate backends

ExpectColumnValuesToBeOfType is a ColumnMapExpectation, but on Spark,
SqlAlchemy, and the non-object Pandas branch its result only contained
{"observed_value": ...} instead of the standard map-result fields
documented for the public API (element_count, unexpected_count,
missing_count, unexpected_percent_total, etc.). This made result
shapes inconsistent across backends and broke downstream consumers
relying on the documented format.

The fix routes the aggregate-style result returned by _validate_pandas
/_validate_sqlalchemy/_validate_spark through _format_map_output using
table.row_count and the column null count, while preserving the
existing observed_value field. For success the entire column passes
(unexpected_count=0); on failure every non-null row is unexpected
(unexpected_count=nonnull_count). result.success behavior is
unchanged.

Closes #11076
Copilot AI review requested due to automatic review settings April 29, 2026 22:12
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 29, 2026

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit 9fc5754
🔍 Latest deploy log https://app.netlify.com/projects/niobium-lead-7998/deploys/69f887292d631a00081a4f2f

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ExpectColumnValuesToBeOfType to return standard ColumnMapExpectation-style result fields (e.g., element_count, unexpected_count, missing_count, etc.) on non-Pandas-object backends (Spark, SqlAlchemy, and non-object Pandas dtype aggregate path), aligning the output with the documented expectation contract and addressing #11076.

Changes:

  • Adds table.row_count and column_values.nonnull.unexpected_count to validation dependencies so aggregate backends can populate map-style summary fields.
  • Introduces _build_map_result to run aggregate validation outcomes through _format_map_output for consistent result shaping across backends.
  • Adds a Spark integration regression test asserting map-result fields are present for ResultFormat.SUMMARY.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
great_expectations/expectations/core/expect_column_values_to_be_of_type.py Adds validation dependencies and formats aggregate backend results via _format_map_output to provide standard map-result fields.
tests/integration/data_sources_and_expectations/expectations/test_expect_column_values_to_be_of_type.py Adds Spark integration test to ensure the result includes standard map fields (regression for #11076).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 88.23529% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.80%. Comparing base (4364e42) to head (9fc5754).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...tations/core/expect_column_values_to_be_of_type.py 88.23% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #11855   +/-   ##
========================================
  Coverage    84.79%   84.80%           
========================================
  Files          471      471           
  Lines        39169    39200   +31     
========================================
+ Hits         33215    33242   +27     
- Misses        5954     5958    +4     
Flag Coverage Δ
3.10 73.60% <85.29%> (+0.03%) ⬆️
3.11 73.64% <85.29%> (+0.03%) ⬆️
3.12 73.65% <85.29%> (+0.03%) ⬆️
3.13 73.65% <85.29%> (+0.03%) ⬆️
3.13 athena 41.83% <11.76%> (-0.03%) ⬇️
3.13 aws_deps 45.08% <11.76%> (-0.03%) ⬇️
3.13 big 55.15% <11.76%> (-0.04%) ⬇️
3.13 bigquery 51.21% <76.47%> (+0.02%) ⬆️
3.13 clickhouse 41.84% <11.76%> (-0.03%) ⬇️
3.13 databricks 53.02% <76.47%> (+0.02%) ⬆️
3.13 filesystem 64.31% <76.47%> (+0.01%) ⬆️
3.13 gx-redshift 51.37% <76.47%> (+0.02%) ⬆️
3.13 mysql 51.75% <76.47%> (+0.02%) ⬆️
3.13 openpyxl or pyarrow or project or sqlite or aws_creds 59.91% <76.47%> (+0.01%) ⬆️
3.13 postgresql 55.18% <82.35%> (+0.02%) ⬆️
3.13 singlestore 46.99% <11.76%> (-0.03%) ⬇️
3.13 snowflake 1/3 49.69% <76.47%> (+0.02%) ⬆️
3.13 snowflake 2/3 49.28% <11.76%> (-0.04%) ⬇️
3.13 snowflake 3/3 50.16% <11.76%> (-0.03%) ⬇️
3.13 spark 56.01% <82.35%> (+0.03%) ⬆️
3.13 spark_connect 46.75% <11.76%> (-0.03%) ⬇️
3.13 sql_server 53.18% <76.47%> (+0.02%) ⬆️
3.13 trino 48.63% <11.76%> (-0.03%) ⬇️
cloud 0.00% <0.00%> (ø)
docs-basic 59.46% <82.35%> (+0.02%) ⬆️
docs-creds-needed 60.57% <11.76%> (-0.04%) ⬇️
docs-spark 57.45% <11.76%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings April 30, 2026 14:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ExpectColumnValuesToBeOfType result-shape inconsistency across backends by ensuring non-Pandas-object (Spark/SqlAlchemy/non-object Pandas) validation paths return standard ColumnMapExpectation map-result fields (element/missing/unexpected counts & percents), aligning with documented expectation output.

Changes:

  • Add table.row_count and column_values.nonnull.unexpected_count validation dependencies for the aggregate-style validation paths.
  • Introduce _build_map_result to format aggregate validation output via _format_map_output (while preserving observed_value).
  • Add a Spark integration test asserting presence of standard map-result fields (regression for #11076).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
great_expectations/expectations/core/expect_column_values_to_be_of_type.py Adds dependencies + formats aggregate backend results through _format_map_output for consistent map-result fields.
tests/integration/data_sources_and_expectations/expectations/test_expect_column_values_to_be_of_type.py Adds Spark regression test ensuring result contains standard map fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread great_expectations/expectations/core/expect_column_values_to_be_of_type.py Outdated
_build_map_result was unconditionally attaching observed_value to
formatted["result"], which re-introduced a result dict for BOOLEAN_ONLY
where _format_map_output intentionally omits one. Gate the attach on the
parsed result_format and add a Spark regression test asserting
result.result == {} for BOOLEAN_ONLY.
Copilot AI review requested due to automatic review settings April 30, 2026 17:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ExpectColumnValuesToBeOfType so that non-Pandas-object backends (Spark/SQLAlchemy and aggregate-style Pandas paths) return the standard ColumnMapExpectation result shape (element/unexpected/missing counts & percents, etc.) instead of only {"observed_value": ...}, aligning runtime behavior with the documented contract.

Changes:

  • Add table.row_count and column_values.nonnull.unexpected_count validation dependencies and introduce _build_map_result() to format aggregate-path results via _format_map_output.
  • Update _validate() to return formatted map-style results for Spark/SQLAlchemy/aggregate Pandas paths, while preserving the existing Pandas-object map path.
  • Add Spark integration tests asserting standard map fields are present for SUMMARY and that BOOLEAN_ONLY keeps result == {}.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
great_expectations/expectations/core/expect_column_values_to_be_of_type.py Adds aggregate-path dependencies and a helper to format aggregate validation output into standard map-result fields across backends.
tests/integration/data_sources_and_expectations/expectations/test_expect_column_values_to_be_of_type.py Adds Spark regression tests covering standard map-result fields and BOOLEAN_ONLY behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 30, 2026 21:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ExpectColumnValuesToBeOfType validation result shape on non-Pandas-object backends so it consistently returns standard ColumnMapExpectation map-result fields (e.g., element_count, unexpected_count, missing_count, etc.), addressing reported issue #11076.

Changes:

  • Declares additional validation dependencies (table.row_count, column_values.nonnull.unexpected_count) and introduces _build_map_result to format aggregate-backend outputs via _format_map_output.
  • Updates _validate to wrap aggregate-backend validation outputs with consistent map-result formatting (while preserving the Pandas-object map path).
  • Adds a Spark integration test reproducing the missing-fields bug and a regression test for BOOLEAN_ONLY result format.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
great_expectations/expectations/core/expect_column_values_to_be_of_type.py Adds required metrics and formats aggregate-backend results into standard map-result shape via _format_map_output.
tests/integration/data_sources_and_expectations/expectations/test_expect_column_values_to_be_of_type.py Adds Spark integration coverage for the corrected result format and BOOLEAN_ONLY contract.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings May 4, 2026 09:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +596 to +600
validation_dependencies.set_metric_configuration(
metric_name="table.row_count",
metric_configuration=MetricConfiguration(
metric_name="table.row_count",
metric_domain_kwargs=row_count_metric_kwargs["metric_domain_kwargs"],
Replace the Spark presence-only assertion with a shared helper that
verifies the full set of ColumnMapExpectation result fields with
concrete values across each aggregate-path backend (non-object Pandas,
PostgreSQL, Spark). Covers both success and failure cases, plus
columns containing nulls so missing_count is exercised.
Copilot AI review requested due to automatic review settings May 4, 2026 10:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +586 to +620
# Issue #11076: ExpectColumnValuesToBeOfType is a ColumnMapExpectation, so its
# result should include the standard map-result fields (element_count,
# unexpected_count, missing_count, etc.) regardless of backend. The aggregate
# paths (Spark, SqlAlchemy, non-object Pandas) need table.row_count and the
# column null count to populate those fields via _format_map_output.
row_count_metric_kwargs = get_metric_kwargs(
metric_name="table.row_count",
configuration=configuration,
runtime_configuration=runtime_configuration,
)
validation_dependencies.set_metric_configuration(
metric_name="table.row_count",
metric_configuration=MetricConfiguration(
metric_name="table.row_count",
metric_domain_kwargs=row_count_metric_kwargs["metric_domain_kwargs"],
metric_value_kwargs=row_count_metric_kwargs["metric_value_kwargs"],
),
)

nonnull_unexpected_count_metric_name = (
f"column_values.nonnull.{SummarizationMetricNameSuffixes.UNEXPECTED_COUNT.value}"
)
nonnull_metric_kwargs = get_metric_kwargs(
metric_name=nonnull_unexpected_count_metric_name,
configuration=configuration,
runtime_configuration=runtime_configuration,
)
validation_dependencies.set_metric_configuration(
metric_name=nonnull_unexpected_count_metric_name,
metric_configuration=MetricConfiguration(
metric_name=nonnull_unexpected_count_metric_name,
metric_domain_kwargs=nonnull_metric_kwargs["metric_domain_kwargs"],
metric_value_kwargs=nonnull_metric_kwargs["metric_value_kwargs"],
),
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ExpectColumnValuesToBeOfType dq rule is not returning expected result format

2 participants