[BUGFIX] Fix misleading result format docs for ExpectColumnValuesToBeOfType (#11076)#11880
Open
creazyfrog wants to merge 2 commits into
Open
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #11076
The docstring Code Examples for
ExpectColumnValuesToBeOfTypeshowed the full Column Map result format (element_count,unexpected_count,partial_unexpected_list, etc.) for all backends. In practice this format is only returned when Pandas is used with a column whose dtype isobject(row-level type inspection). For all other backends — SQL (Databricks, Snowflake, SQL Server, PostgreSQL, Trino), Spark, and Pandas with non-object dtypes — the expectation validates the column's schema-level data type and returns only{"observed_value": "<type_name>"}, making the documented examples actively misleading.Users on Databricks or Spark opened issue #11076 because they expected the full map format based on the docs.
Root Cause
_validate_pandas(non-object path),_validate_sqlalchemy, and_validate_sparkperform a schema-level aggregate check — there are no "unexpected rows" to enumerate, so the full Column Map output (element_count,unexpected_count,partial_unexpected_list, etc.) is fundamentally unavailable. The only meaningful result field isobserved_value(the actual column type). The Code Examples in the docstring were copied from a different context (the Pandas row-level map path) without being adjusted for the aggregate paths.Changes
great_expectations/expectations/core/expect_column_values_to_be_of_type.py{"observed_value": "<type>"}element_count,unexpected_count, etc.)observed_valueformat, which is what the vast majority of users actually see.tests/expectations/core/test_expect_column_values_to_be_of_type.pytest_expect_column_values_to_be_of_type_result_contains_observed_value_for_pandas— a unit test that assertsobserved_valueis present andelement_countis absent for a Pandas non-object column, preventing a future regression where the aggregate path accidentally returns the map format.Test plan
test_expect_column_values_to_be_of_type_result_contains_observed_value_for_pandas(new unit test)resultcontains onlyobserved_valuefor all aggregate-mode paths