Skip to content

[BUGFIX] Fix misleading result format docs for ExpectColumnValuesToBeOfType (#11076)#11880

Open
creazyfrog wants to merge 2 commits into
fivetran:developfrom
creazyfrog:fix/expect-column-values-to-be-of-type-result-format-11076
Open

[BUGFIX] Fix misleading result format docs for ExpectColumnValuesToBeOfType (#11076)#11880
creazyfrog wants to merge 2 commits into
fivetran:developfrom
creazyfrog:fix/expect-column-values-to-be-of-type-result-format-11076

Conversation

@creazyfrog

Copy link
Copy Markdown

Summary

Fixes #11076

The docstring Code Examples for ExpectColumnValuesToBeOfType showed the full Column Map result format (element_count, unexpected_count, partial_unexpected_list, etc.) for all backends. In practice this format is only returned when Pandas is used with a column whose dtype is object (row-level type inspection). For all other backends — SQL (Databricks, Snowflake, SQL Server, PostgreSQL, Trino), Spark, and Pandas with non-object dtypes — the expectation validates the column's schema-level data type and returns only {"observed_value": "<type_name>"}, making the documented examples actively misleading.

Users on Databricks or Spark opened issue #11076 because they expected the full map format based on the docs.

Root Cause

_validate_pandas (non-object path), _validate_sqlalchemy, and _validate_spark perform a schema-level aggregate check — there are no "unexpected rows" to enumerate, so the full Column Map output (element_count, unexpected_count, partial_unexpected_list, etc.) is fundamentally unavailable. The only meaningful result field is observed_value (the actual column type). The Code Examples in the docstring were copied from a different context (the Pandas row-level map path) without being adjusted for the aggregate paths.

Changes

great_expectations/expectations/core/expect_column_values_to_be_of_type.py

  • Replaced the two misleading Code Examples with a clear Result Format section that documents both shapes and explains exactly when each applies:
    • SQL / Spark / Pandas non-object dtype{"observed_value": "<type>"}
    • Pandas with object dtype → full Column Map format (element_count, unexpected_count, etc.)
  • Updated the Code Examples to show the observed_value format, which is what the vast majority of users actually see.

tests/expectations/core/test_expect_column_values_to_be_of_type.py

  • Added test_expect_column_values_to_be_of_type_result_contains_observed_value_for_pandas — a unit test that asserts observed_value is present and element_count is absent for a Pandas non-object column, preventing a future regression where the aggregate path accidentally returns the map format.

Test plan

  • test_expect_column_values_to_be_of_type_result_contains_observed_value_for_pandas (new unit test)
  • Verified manually on Pandas, SQLite, and mocked Databricks dialect that result contains only observed_value for all aggregate-mode paths

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale issues and PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ExpectColumnValuesToBeOfType dq rule is not returning expected result format

1 participant