[BUGFIX] Fix misleading result format docs for ExpectColumnValuesToBeOfType (#11076)#11880
Conversation
…OfType (great-expectations#11076) The docstring Code Examples for ExpectColumnValuesToBeOfType showed the full Column Map result format (element_count, unexpected_count, partial_unexpected_list, etc.) for all backends. In practice this format is only returned when Pandas is used with a column whose dtype is 'object' (row-level inspection). For all other backends — SQL (including Databricks, Snowflake, SQL Server, PostgreSQL, Trino), Spark, and Pandas with non-object dtypes — the expectation validates the column's schema-level data type and returns only {"observed_value": "<type_name>"}. Users relying on the documented format for Databricks or Spark were silently getting a different structure and had no way to know which format to expect. Changes: - Replaced the misleading Code Examples in the class docstring with a clear "Result Format" section that documents both shapes and explains when each applies. - Added a unit test that asserts 'observed_value' is present (and 'element_count' is absent) when running against a Pandas non-object column, preventing future regressions where the aggregate path accidentally switches to the map format (or vice-versa). Fixes great-expectations#11076
👷 Deploy request for niobium-lead-7998 pending review.Visit the deploys page to approve it
|
|
A new contributor, HUZZAH! Welcome and thanks for joining our community. In order to accept a pull request we require that all contributors sign our Contributor License Agreement. We have two different CLAs, depending on whether you are contributing to GX in a personal or professional capacity. Please sign the one that is applicable to your situation so that we may accept your contribution: Individual Contributor License Agreement v1.0 Once you have signed the CLA, you can add a comment with the text Please reach out to the #gx-community-support channel, on our Slack if you have any questions or if you have already signed the CLA and are receiving this message in error. Users missing a CLA: creazyfrog |
for more information, see https://pre-commit.ci
|
A new contributor, HUZZAH! Welcome and thanks for joining our community. In order to accept a pull request we require that all contributors sign our Contributor License Agreement. We have two different CLAs, depending on whether you are contributing to GX in a personal or professional capacity. Please sign the one that is applicable to your situation so that we may accept your contribution: Individual Contributor License Agreement v1.0 Once you have signed the CLA, you can add a comment with the text Please reach out to the #gx-community-support channel, on our Slack if you have any questions or if you have already signed the CLA and are receiving this message in error. Users missing a CLA: creazyfrog |
Summary
Fixes #11076
The docstring Code Examples for
ExpectColumnValuesToBeOfTypeshowed the full Column Map result format (element_count,unexpected_count,partial_unexpected_list, etc.) for all backends. In practice this format is only returned when Pandas is used with a column whose dtype isobject(row-level type inspection). For all other backends — SQL (Databricks, Snowflake, SQL Server, PostgreSQL, Trino), Spark, and Pandas with non-object dtypes — the expectation validates the column's schema-level data type and returns only{"observed_value": "<type_name>"}, making the documented examples actively misleading.Users on Databricks or Spark opened issue #11076 because they expected the full map format based on the docs.
Root Cause
_validate_pandas(non-object path),_validate_sqlalchemy, and_validate_sparkperform a schema-level aggregate check — there are no "unexpected rows" to enumerate, so the full Column Map output (element_count,unexpected_count,partial_unexpected_list, etc.) is fundamentally unavailable. The only meaningful result field isobserved_value(the actual column type). The Code Examples in the docstring were copied from a different context (the Pandas row-level map path) without being adjusted for the aggregate paths.Changes
great_expectations/expectations/core/expect_column_values_to_be_of_type.py{"observed_value": "<type>"}element_count,unexpected_count, etc.)observed_valueformat, which is what the vast majority of users actually see.tests/expectations/core/test_expect_column_values_to_be_of_type.pytest_expect_column_values_to_be_of_type_result_contains_observed_value_for_pandas— a unit test that assertsobserved_valueis present andelement_countis absent for a Pandas non-object column, preventing a future regression where the aggregate path accidentally returns the map format.Test plan
test_expect_column_values_to_be_of_type_result_contains_observed_value_for_pandas(new unit test)resultcontains onlyobserved_valuefor all aggregate-mode paths