[BUGFIX] Compare type dict column name with actual column name with casefold. … #11064

mihirraj · 2025-04-06T01:27:58Z

…If the column name is passed as all caps or snake-case then the column matching will fail and it will raise an indexError which is hard to debug for users.
Steps to reproduce:

`import great_expectations as gx
import pandas as pd
import yaml
from great_expectations.expectations import ExpectColumnValuesToBeOfType

NAME_DATA_SOURCE = "pandas"
NAME_DATA_ASSET = "tutorial_data"
NAME_BATCH_DEF = "pandas_tutorial"
NAME_EXPECTATION_SUITE = "pandas_tutorial"
NAME_VALIDATION_DEF = "pandas_validation"
NAME_CHECKPOINT = "pandas"

Create a small DataFrame

data = {
"age": [25, 30, 35, 40]
}
config = yaml.safe_load(open("expectations.yml"))

df = pd.DataFrame(data)

context = gx.get_context()

data_source = context.data_sources.add_pandas(name=NAME_DATA_SOURCE)
data_asset = data_source.add_dataframe_asset(name=NAME_DATA_ASSET)
batch_definition = data_asset.add_batch_definition_whole_dataframe(NAME_BATCH_DEF)

expectation_suite = gx.ExpectationSuite(name="demo_expectation_suite")
expectation_suite = context.suites.add(expectation_suite)

expectations = ExpectColumnValuesToBeOfType(
column="AGE",
type_="int64"
)

expectation_suite.add_expectation(expectations)

batch_parameters = {"dataframe": df}
batch = batch_definition.get_batch(batch_parameters=batch_parameters)
validation_results = batch.validate(expectation_suite)

print(validation_results)`

Description of PR changes above includes a link to an existing GitHub issue
PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB], [MINORBUMP]
Code is linted - run invoke lint (uses ruff format + ruff check)
Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

…If the column name is passed as all caps or snakecase then the column matching will fail and it will raise indexError which is hard to detect.

netlify · 2025-04-06T01:28:01Z

‼️ Deploy request for niobium-lead-7998 rejected.

Name	Link
🔨 Latest commit	`00118fb`

billdirks · 2025-04-14T20:49:03Z

great_expectations/expectations/core/expect_column_values_to_be_of_type.py

@@ -566,7 +566,7 @@ def _validate(
        actual_column_type = [
            type_dict["type"]
            for type_dict in actual_column_types_list
-            if type_dict["name"] == column_name
+            if type_dict["name"].casefold() == column_name.casefold()


Thanks for contributing. I agree that an IndexError is onerous for a user to debug and we should make this easier. I don't this this is a viable solution though because a table can have column names where casing matters. For example, a postgres database may have a table, MyTable, with quoted columns of different types called "MyColumn" and "mycolumn". These column names would be identical using casefold.
Different databases may use different quoting characters for column names, eg ", ', `. Also, in pandas column names are case sensitive and spark can be configurable to have case sensitive column names.

To fix this, making a better error message might be easier than changing the behavior of the expectation.

Compare type dict column name with actual column name with casefold. …

00118fb

…If the column name is passed as all caps or snakecase then the column matching will fail and it will raise indexError which is hard to detect.

billdirks requested changes Apr 14, 2025

View reviewed changes

wookasz added the community label Apr 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUGFIX] Compare type dict column name with actual column name with casefold. … #11064

[BUGFIX] Compare type dict column name with actual column name with casefold. … #11064

Uh oh!

mihirraj commented Apr 6, 2025 •

edited

Loading

Uh oh!

netlify bot commented Apr 6, 2025 •

edited

Loading

Uh oh!

billdirks Apr 14, 2025

Uh oh!

Uh oh!

[BUGFIX] Compare type dict column name with actual column name with casefold. … #11064

Are you sure you want to change the base?

[BUGFIX] Compare type dict column name with actual column name with casefold. … #11064

Uh oh!

Conversation

mihirraj commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Create a small DataFrame

Uh oh!

netlify bot commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

‼️ Deploy request for niobium-lead-7998 rejected.

Uh oh!

billdirks Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mihirraj commented Apr 6, 2025 •

edited

Loading

netlify bot commented Apr 6, 2025 •

edited

Loading