Skip to content

[MAINTENANCE] Extract type comparison logic into dedicated module#11798

Open
wookasz wants to merge 28 commits into
developfrom
m/extract-type-comparison-module
Open

[MAINTENANCE] Extract type comparison logic into dedicated module#11798
wookasz wants to merge 28 commits into
developfrom
m/extract-type-comparison-module

Conversation

@wookasz

@wookasz wookasz commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Extracts dialect-aware type comparison logic from ExpectColumnValuesToBeOfType and ExpectColumnValuesToBeInTypeList into a new type_comparison module at great_expectations/expectations/type_comparison.py
  • Public API: compare_column_type(), compare_column_type_list(), native_type_type_map(), and CASE_INSENSITIVE_DIALECTS constant
  • Both expectations now delegate SQLAlchemy validation to the module, reducing _validate_sqlalchemy() to 4 lines each
  • All consumers (metric modules, metrics/util.py) import directly from type_comparison — no re-exports
  • Eliminates the duplicated case_insensitive_dialects set in metrics/util.py
  • 846 unit tests covering all dialect-specific type comparisons for PostgreSQL, Snowflake, Databricks, SQL Server, Trino, SQLite, and MySQL — both scalar and list functions tested independently per dialect

Context

This is Phase 1 of a two-phase effort to reduce Databricks/Snowflake CI runtime. By extracting and unit-testing the type comparison logic, Phase 2 can safely narrow most expectation integration tests to Postgres-only, since dialect-specific behavior is now covered by unit tests and metric-level integration tests.

Test plan

  • New unit tests pass: pytest tests/expectations/test_type_comparison.py
  • Existing unit tests unchanged: pytest tests/expectations/core/test_expect_column_values_to_be_of_type.py tests/expectations/core/test_expect_column_values_to_be_in_type_list.py
  • Ruff passes on all changed files
  • CI unit tests pass across all Python versions (3.10-3.13)
  • CI static-analysis (mypy) passes

Move dialect-aware type comparison logic from the two type expectations
into a new `type_comparison` module with a clean public API:
- `compare_column_type()` and `compare_column_type_list()` handle dialect
  dispatch internally (case-insensitive string comparison for 5 dialects,
  isinstance-based class matching for all others)
- `native_type_type_map()` for pandas type resolution
- `CASE_INSENSITIVE_DIALECTS` constant (eliminates duplicate in metrics/util.py)

Both expectations now delegate to the module, and all consumers import
directly from type_comparison rather than from the expectation files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 9, 2026 18:27
@netlify

netlify Bot commented Apr 9, 2026

Copy link
Copy Markdown

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit a74796e
🔍 Latest deploy log https://app.netlify.com/projects/niobium-lead-7998/deploys/69d964ee2a02e0000852db27

@codecov

codecov Bot commented Apr 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.11921% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.68%. Comparing base (160f568) to head (a74796e).
⚠️ Report is 67 commits behind head on develop.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
great_expectations/expectations/type_comparison.py 82.57% 23 Missing ⚠️
...ns/core/expect_column_values_to_be_in_type_list.py 66.66% 2 Missing ⚠️
...tations/core/expect_column_values_to_be_of_type.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #11798      +/-   ##
===========================================
+ Coverage    84.67%   84.68%   +0.01%     
===========================================
  Files          471      472       +1     
  Lines        39179    39199      +20     
===========================================
+ Hits         33174    33197      +23     
+ Misses        6005     6002       -3     
Flag Coverage Δ
3.10 73.72% <72.84%> (+0.13%) ⬆️
3.10 athena ?
3.10 aws_deps ?
3.10 big ?
3.10 clickhouse ?
3.10 filesystem ?
3.10 mysql ?
3.10 openpyxl or pyarrow or project or sqlite or aws_creds ?
3.10 postgresql ?
3.10 spark ?
3.10 spark_connect ?
3.10 sql_server ?
3.10 trino ?
3.11 73.76% <72.84%> (+0.13%) ⬆️
3.11 athena ?
3.11 aws_deps ?
3.11 big ?
3.11 clickhouse ?
3.11 filesystem ?
3.11 mysql ?
3.11 openpyxl or pyarrow or project or sqlite or aws_creds ?
3.11 postgresql ?
3.11 spark_connect ?
3.11 sql_server ?
3.11 trino ?
3.12 73.77% <72.84%> (+0.14%) ⬆️
3.12 athena ?
3.12 aws_deps ?
3.12 big ?
3.12 filesystem ?
3.12 mysql ?
3.12 openpyxl or pyarrow or project or sqlite or aws_creds ?
3.12 postgresql ?
3.12 spark ?
3.12 spark_connect ?
3.12 sql_server ?
3.12 trino ?
3.13 73.77% <72.84%> (+0.13%) ⬆️
3.13 athena 41.93% <23.17%> (+<0.01%) ⬆️
3.13 aws_deps 45.18% <23.17%> (+<0.01%) ⬆️
3.13 big 55.26% <32.45%> (-0.01%) ⬇️
3.13 bigquery 51.25% <52.31%> (-0.01%) ⬇️
3.13 clickhouse 41.94% <21.85%> (+<0.01%) ⬆️
3.13 databricks 53.07% <33.77%> (+0.01%) ⬆️
3.13 filesystem 64.36% <33.77%> (-0.01%) ⬇️
3.13 gx-redshift 51.40% <52.98%> (-0.01%) ⬇️
3.13 mysql 51.80% <50.99%> (-0.01%) ⬇️
3.13 openpyxl or pyarrow or project or sqlite or aws_creds 59.96% <50.99%> (-0.01%) ⬇️
3.13 postgresql 55.23% <33.77%> (+0.01%) ⬆️
3.13 snowflake 53.90% <33.77%> (+<0.01%) ⬆️
3.13 spark 55.91% <24.50%> (-0.01%) ⬇️
3.13 spark_connect 46.85% <23.17%> (+<0.01%) ⬆️
3.13 sql_server 53.24% <33.77%> (+0.01%) ⬆️
3.13 trino 48.74% <24.50%> (-0.01%) ⬇️
cloud 0.00% <0.00%> (ø)
docs-basic 59.52% <33.77%> (+<0.01%) ⬆️
docs-creds-needed 58.10% <24.50%> (-0.01%) ⬇️
docs-spark 57.56% <24.50%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extracts dialect-aware SQLAlchemy column-type comparison logic (used by ExpectColumnValuesToBeOfType and ExpectColumnValuesToBeInTypeList) into a new type_comparison module, and updates expectations/metrics to delegate to that shared implementation.

Changes:

  • Added great_expectations.expectations.core.type_comparison with shared helpers (compare_column_type, compare_column_type_list, native_type_type_map, CASE_INSENSITIVE_DIALECTS).
  • Updated both type expectations and relevant column-map metrics to use the new module (removing duplicated/embedded logic).
  • Updated metrics column metadata handling to reuse the shared CASE_INSENSITIVE_DIALECTS constant and added a new unit test suite for the module.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/expectations/core/test_type_comparison.py Adds unit tests for the new type comparison module.
great_expectations/expectations/metrics/util.py Reuses shared CASE_INSENSITIVE_DIALECTS instead of a local duplicate.
great_expectations/expectations/metrics/column_map_metrics/column_values_of_type.py Switches pandas native type mapping to native_type_type_map().
great_expectations/expectations/metrics/column_map_metrics/column_values_in_type_list.py Switches pandas native type mapping to native_type_type_map().
great_expectations/expectations/core/type_comparison.py New consolidated module implementing dialect-aware type comparisons.
great_expectations/expectations/core/expect_column_values_to_be_of_type.py Delegates SQLAlchemy type comparison and native type mapping to type_comparison.
great_expectations/expectations/core/expect_column_values_to_be_in_type_list.py Delegates SQLAlchemy type-list comparison and native type mapping to type_comparison.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +161 to +163
# CaseInsensitiveString objects will automatically do case-insensitive comparison
return actual_column_type == expected_type
return str(actual_column_type).lower() == expected_type.lower()

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_compare_type_string() claims to do case-insensitive comparison, but for plain str it currently uses actual_column_type == expected_type (case-sensitive). This is inconsistent with compare_column_type_list() (which lowercases) and with the module/docstring intent. Consider normalizing both sides (e.g., casefold()/lower()) for plain strings while still allowing CaseInsensitiveString.__eq__ to handle its own semantics.

Suggested change
# CaseInsensitiveString objects will automatically do case-insensitive comparison
return actual_column_type == expected_type
return str(actual_column_type).lower() == expected_type.lower()
# Preserve custom __eq__ behavior for str subclasses such as
# CaseInsensitiveString, but normalize plain str values explicitly.
if type(actual_column_type) is str:
return actual_column_type.casefold() == expected_type.casefold()
return actual_column_type == expected_type
return str(actual_column_type).casefold() == expected_type.casefold()

Copilot uses AI. Check for mistakes.
execution_engine: SqlAlchemyExecutionEngine,
) -> ModuleType:
if execution_engine.dialect_module is None:
logger.warning("No sqlalchemy dialect found; relying in top-level sqlalchemy types.")

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log message has a grammatical typo: "relying in top-level sqlalchemy types" should be "relying on top-level sqlalchemy types".

Suggested change
logger.warning("No sqlalchemy dialect found; relying in top-level sqlalchemy types.")
logger.warning("No sqlalchemy dialect found; relying on top-level sqlalchemy types.")

Copilot uses AI. Check for mistakes.
def _mock_engine(mocker, dialect_name):
"""Create a mock SqlAlchemyExecutionEngine with the given dialect_name."""
engine = mocker.MagicMock()
type(engine).dialect_name = mocker.PropertyMock(return_value=dialect_name)

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mock_engine() sets type(engine).dialect_name using PropertyMock, which mutates the shared MagicMock class and can leak between tests. Prefer setting engine.dialect_name = <value> on the instance, or use a small stub class/object with a dialect_name attribute to avoid global side effects.

Suggested change
type(engine).dialect_name = mocker.PropertyMock(return_value=dialect_name)
engine.dialect_name = dialect_name

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +115
mock_type = mocker.MagicMock()
mock_type.__str__ = lambda self: "INTEGER"
type(mock_type).__name__ = "INTEGER"

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests mutate the MagicMock class by assigning type(mock_type).__name__ = "INTEGER", which can affect other mocks and make the suite order-dependent. Use a dedicated dummy type/class instance (e.g., a small local class with __name__ and __str__) instead of changing the global mock class metadata.

Suggested change
mock_type = mocker.MagicMock()
mock_type.__str__ = lambda self: "INTEGER"
type(mock_type).__name__ = "INTEGER"
class INTEGER:
def __str__(self):
return "INTEGER"
mock_type = INTEGER()

Copilot uses AI. Check for mistakes.
@pytest.mark.parametrize("dialect", list(CASE_INSENSITIVE_DIALECTS))
def test_case_insensitive_match(self, mocker, dialect):
engine = _mock_engine(mocker, dialect)
success, _observed = compare_column_type(engine, "integer", "integer")

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_case_insensitive_match doesn't currently assert mixed-case behavior (it compares "integer" to "integer"). To verify the case-insensitive path, use differing cases (e.g., actual "INTEGER" vs expected "integer") and/or use CaseInsensitiveString as the observed value since that's what production code wraps types with for these dialects.

Suggested change
success, _observed = compare_column_type(engine, "integer", "integer")
success, _observed = compare_column_type(engine, "INTEGER", "integer")

Copilot uses AI. Check for mistakes.
Comment on lines +187 to +195

# ---------------------------------------------------------------------------
# compare_column_type_list — isinstance path
# ---------------------------------------------------------------------------


class TestCompareColumnTypeListByClass:
def test_match_in_list(self, mocker):
engine = _mock_engine(mocker, "sqlite")

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: this test mutates type(mock_type).__name__, which can leak globally and cause flaky failures. Prefer a purpose-built dummy class/instance for the non-string type case.

Copilot uses AI. Check for mistakes.
wookasz and others added 2 commits April 9, 2026 20:25
- Fix _compare_type_string to use casefold() for plain str while
  preserving CaseInsensitiveString.__eq__ for subclasses
- Fix typo: "relying in" -> "relying on"
- Replace MagicMock with lightweight stub classes to avoid mutating
  shared mock class metadata between tests
- Fix test_case_insensitive_match to actually test mixed case

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 03:26
@wookasz wookasz removed the request for review from Copilot April 10, 2026 03:26
Copilot AI review requested due to automatic review settings April 10, 2026 14:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +60 to +86
def native_type_type_map(type_: str) -> tuple | None: # noqa: C901, PLR0911
"""Map a string type name to a tuple of native Python types.

Used by pandas validation paths to resolve type names like "int", "str", etc.
Returns None for unrecognized types.
"""
if type_.lower() == "none":
return (type(None),)
elif type_.lower() == "bool":
return (bool,)
elif type_.lower() in ["int", "long"]:
return (int,)
elif type_.lower() == "float":
return (float,)
elif type_.lower() == "bytes":
return (bytes,)
elif type_.lower() == "complex":
return (complex,)
elif type_.lower() in ["str", "string_types"]:
return (str,)
elif type_.lower() == "list":
return (list,)
elif type_.lower() == "dict":
return (dict,)
elif type_.lower() == "unicode":
return None

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native_type_type_map() has an imprecise return type (tuple | None) and falls off the end implicitly for unknown types. Since this is part of the new public API, tighten the signature (e.g., tuple[type, ...] | None) and add an explicit return None for unrecognized values to make behavior and typing unambiguous.

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +138
actual_column_type.lower() == expected_type.lower()
for expected_type in expected_types_list
)
return success, actual_column_type
else:
ret_type = type(actual_column_type).__name__
success = any(
ret_type.lower() == expected_type.lower() for expected_type in expected_types_list

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case-insensitive comparisons are implemented with .casefold() in _compare_type_string(), but compare_column_type_list() uses .lower(). Since this module is meant to centralize dialect-aware comparisons (and already notes Unicode correctness), it would be more consistent/robust to use .casefold() in the list path as well.

Suggested change
actual_column_type.lower() == expected_type.lower()
for expected_type in expected_types_list
)
return success, actual_column_type
else:
ret_type = type(actual_column_type).__name__
success = any(
ret_type.lower() == expected_type.lower() for expected_type in expected_types_list
actual_column_type.casefold() == expected_type.casefold()
for expected_type in expected_types_list
)
return success, actual_column_type
else:
ret_type = type(actual_column_type).__name__
success = any(
ret_type.casefold() == expected_type.casefold()
for expected_type in expected_types_list

Copilot uses AI. Check for mistakes.
wookasz and others added 2 commits April 10, 2026 08:14
Add comprehensive unit tests covering actual type names for each dialect:
- PostgreSQL: CHAR, TEXT, VARCHAR, INTEGER, SMALLINT, BIGINT, TIMESTAMP,
  DATE, DOUBLE PRECISION, BOOLEAN, NUMERIC + case insensitivity
- Snowflake: STRING, TEXT, CHARACTER, BYTEINT, TINYINT, INTEGER, BIGINT,
  FLOAT, DOUBLE, DECIMAL(38,0), FIXED, DEC, NUMBER, DATE, TIMESTAMP_NTZ,
  TIMESTAMP_LTZ, TIMESTAMP_TZ, TIME, VARIANT, VARBINARY, GEOGRAPHY, GEOMETRY
- Databricks: STRING, INT, BIGINT, SMALLINT, TINYINT, BOOLEAN, FLOAT,
  DOUBLE, DECIMAL, DATE, TIMESTAMP, TIMESTAMP_NTZ
- SQL Server: INTEGER, BIGINT, SMALLINT, FLOAT, REAL, NUMERIC, DECIMAL,
  VARCHAR, NVARCHAR, CHAR, BIT, DATE, DATETIME, DATETIME2
- Trino: INTEGER, BIGINT, SMALLINT, TINYINT, DOUBLE, DECIMAL, VARCHAR,
  CHAR, BOOLEAN, DATE, TIMESTAMP
- SQLite: isinstance path with sa.types for all common types + list tests
- MySQL: isinstance path with sa.types for all common types
- CaseInsensitiveString: quoted vs unquoted behavior in both scalar and
  list comparison paths

Total: 177 unit tests (up from 63).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 15:15
@wookasz wookasz removed the request for review from Copilot April 10, 2026 15:15
…ar and list

Rewrite test suite to independently test both compare_column_type (scalar)
and compare_column_type_list (list) for every dialect and type — no
assumption that they share implementation.

Dialect coverage (scalar + list for each type):
- PostgreSQL: 19 types (CHAR through INTERVAL)
- Snowflake: 29 types (STRING through BOOLEAN, incl DECIMAL variants)
- Databricks: 17 types (STRING through STRUCT)
- SQL Server: 27 types (INTEGER through XML)
- Trino: 20 types (INTEGER through ROW)
- SQLite: 14 types via sa.types isinstance path
- MySQL: 14 types via sa.types isinstance path
- CaseInsensitiveString: quoted/unquoted behavior tested independently
  for both scalar and list paths
- Non-string type fallback: tested for all case-insensitive dialects

Total: 831 unit tests (up from 177).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 15:24
The module serves both expectations/core/ and expectations/metrics/
equally — it's shared infrastructure, not specific to core. Moving it
to expectations/type_comparison.py gives a cleaner import path and
avoids metrics importing from expectations.core.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wookasz and others added 2 commits April 10, 2026 08:40
Test file now lives at tests/expectations/test_type_comparison.py,
mirroring the module at expectations/type_comparison.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sort CASE_INSENSITIVE_DIALECTS parametrize values for deterministic
  test ID ordering across pytest-xdist workers
- Add explicit return None to native_type_type_map for mypy
- Relax _get_redshift_sqlalchemy_types return type annotation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 15:49
wookasz and others added 2 commits April 10, 2026 08:57
…stently

- Tighten native_type_type_map return type to tuple[type, ...] | None
- Use casefold() instead of lower() in compare_column_type_list for
  consistency with the scalar path and better Unicode handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g dialect tests

- Add comment explaining why CASE_INSENSITIVE_DIALECTS uses GXSqlDialect
  enum members (GXSqlDialect.__eq__ handles cross-type str comparison)
- Add test_string_membership_via_gxsqldialect_eq to prove str 'in'
  checks work against the frozenset
- Add TestStringDialectName class exercising both compare_column_type and
  compare_column_type_list with plain string dialect names (as returned
  by SqlAlchemyExecutionEngine.dialect_name in production)

Total: 846 tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 16:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import teradatasqlalchemy.dialect
import teradatasqlalchemy.types as teradatatypes
except ImportError:
teradatasqlalchemy = None

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the teradatasqlalchemy import fallback, teradatatypes is not set when the import fails. _get_dialect_type_module() later references teradatatypes, which will raise NameError in environments without teradatasqlalchemy installed. Set teradatatypes = None in the except ImportError block (mirrors patterns used elsewhere in the repo).

Suggested change
teradatasqlalchemy = None
teradatasqlalchemy = None
teradatatypes = None

Copilot uses AI. Check for mistakes.
Comment on lines +720 to +721
def test_list_uses_lower_not_eq(self):
"""The type-list path uses .lower() comparison, so quoted semantics

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test/docstring says the type-list path uses .lower() comparison, but the implementation in compare_column_type_list() now uses .casefold() for comparison. Please update the wording to match the current behavior (or adjust implementation if .lower() is intended).

Suggested change
def test_list_uses_lower_not_eq(self):
"""The type-list path uses .lower() comparison, so quoted semantics
def test_list_uses_casefold_not_eq(self):
"""The type-list path uses .casefold() comparison, so quoted semantics

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +6
"""Unit tests for great_expectations.expectations.type_comparison.

Every dialect tests BOTH compare_column_type (scalar) and
compare_column_type_list (list) independently — no assumption is made
that they share an implementation.
"""

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/test plan references pytest tests/expectations/core/test_type_comparison.py, but the added unit tests are in tests/expectations/test_type_comparison.py. Please update the PR description (or move/rename the test file) so the documented command matches the repo layout.

Copilot uses AI. Check for mistakes.
wookasz and others added 2 commits April 10, 2026 09:20
Guard teradatasqlalchemy and clickhouse_sqlalchemy with explicit
'is not None' checks before issubclass() calls so mypy doesn't
see attribute access on potentially-None modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set teradatatypes = None in import fallback to match clickhouse pattern
  and prevent potential NameError
- Update test docstring from .lower() to .casefold() after consistency fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 16:23
dialect_module is typed as Module but at runtime can be a dialect class
for Teradata and ClickHouse. The try/except TypeError handles the case
where it's not a class. This matches the original code's behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +6
"""Unit tests for great_expectations.expectations.type_comparison.

Every dialect tests BOTH compare_column_type (scalar) and
compare_column_type_list (list) independently — no assumption is made
that they share an implementation.
"""

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/test plan references running pytest tests/expectations/core/test_type_comparison.py, but the new unit tests are added at tests/expectations/test_type_comparison.py. This mismatch will likely confuse CI/local runs; please either update the PR description/test plan or move the test file into the documented location (and adjust any conventions for this repo’s expectations tests accordingly).

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed — the PR description was updated in a prior commit to reference the correct path (tests/expectations/test_type_comparison.py).

Comment on lines 486 to +490
# Apply case-insensitive formatting for specific dialects
from great_expectations.expectations.type_comparison import CASE_INSENSITIVE_DIALECTS

dialect_name = execution_engine.dialect.name
case_insensitive_dialects = {
GXSqlDialect.DATABRICKS,
GXSqlDialect.POSTGRESQL,
GXSqlDialect.SNOWFLAKE,
GXSqlDialect.SQL_SERVER,
GXSqlDialect.TRINO,
}
if dialect_name in case_insensitive_dialects:
if dialect_name in CASE_INSENSITIVE_DIALECTS:

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing CASE_INSENSITIVE_DIALECTS from great_expectations.expectations.type_comparison here pulls in a module that performs optional third-party imports at import time (e.g., teradatasqlalchemy/clickhouse). Since this helper is on a hot path for column metadata reflection, consider relocating CASE_INSENSITIVE_DIALECTS to a lightweight module (e.g., sqlalchemy_dialect) or making those optional imports lazy inside the helper functions that require them, to avoid unnecessary import overhead/side effects.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import in metrics/util.py is already deferred (inside the function body, not at module level), so it doesn't affect the hot path. The try/except imports of teradatasqlalchemy and clickhouse in type_comparison.py are cheap no-ops when packages aren't installed — they just set variables to None. Moving CASE_INSENSITIVE_DIALECTS to sqlalchemy_dialect.py would reduce coupling but would also separate the constant from the comparison logic it governs. Not worth the trade-off for this PR.

wookasz and others added 2 commits April 10, 2026 09:47
…tion

Verify that both type expectations correctly delegate to the
type_comparison module through their public validator API:

ExpectColumnValuesToBeOfType:
- isinstance path (SQLite native): success + failure
- case-insensitive path (all 5 dialects): success + failure

ExpectColumnValuesToBeInTypeList:
- isinstance path (SQLite native): success + failure
- case-insensitive path (all 5 dialects): success + failure

These tests ensure a refactoring mistake that breaks the wiring between
the expectations and compare_column_type/compare_column_type_list would
be caught, without depending on the module's internal unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 16:47
@wookasz wookasz removed the request for review from Copilot April 10, 2026 16:47
wookasz and others added 2 commits April 10, 2026 11:12
…ed delegation tests

The previous wiring tests proved end-to-end results were correct but
didn't prove the expectations actually delegate to compare_column_type /
compare_column_type_list. Someone could reimplement the logic inline
and the tests would still pass.

New tests monkeypatch the compare functions with controlled return values
and assert:
- The function was called exactly once
- It received the correct execution_engine and expected_type arguments
- The expectation result reflects the patched return value (success and
  observed_value), not independently computed logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 19:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# bigquery geography requires installing an extra package
if (
expected_type.lower() == "geography"
and execution_engine.engine.dialect.name.lower() == GXSqlDialect.BIGQUERY

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BigQuery dialect check compares a lowercase dialect name string to GXSqlDialect.BIGQUERY (an Enum). Since the string is on the left-hand side, this comparison will always be False and the GEOGRAPHY support warning branch will never run. Compare against GXSqlDialect.BIGQUERY.value (or invert the comparison / use execution_engine.dialect_name == GXSqlDialect.BIGQUERY) so the condition can be reached.

Suggested change
and execution_engine.engine.dialect.name.lower() == GXSqlDialect.BIGQUERY
and execution_engine.engine.dialect.name.lower() == GXSqlDialect.BIGQUERY.value

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison works correctly — Python's str.__eq__ returns NotImplemented for non-str types, which causes Python to fall back to GXSqlDialect.__eq__ on the right-hand side. Verified locally: 'bigquery' == GXSqlDialect.BIGQUERY evaluates to True. This is inherited code with the same behavior as the original.

Comment on lines +105 to +110
(success, observed_value) where observed_value is the raw type string
for case-insensitive dialects or type(actual_column_type).__name__ otherwise.
"""
if execution_engine.dialect_name in CASE_INSENSITIVE_DIALECTS:
success = _compare_type_string(actual_column_type, expected_type)
return success, actual_column_type

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For case-insensitive dialects, compare_column_type() always returns actual_column_type as observed_value even when it's not a string (the comparison falls back to str(actual_column_type) but the return value does not). This contradicts the docstring (“raw type string”) and can yield a non-serializable object in the expectation result. Consider returning str(actual_column_type) (or the computed normalized string) when actual_column_type is not a str/CaseInsensitiveString.

Suggested change
(success, observed_value) where observed_value is the raw type string
for case-insensitive dialects or type(actual_column_type).__name__ otherwise.
"""
if execution_engine.dialect_name in CASE_INSENSITIVE_DIALECTS:
success = _compare_type_string(actual_column_type, expected_type)
return success, actual_column_type
(success, observed_value) where observed_value is the type string
for case-insensitive dialects or type(actual_column_type).__name__ otherwise.
"""
if execution_engine.dialect_name in CASE_INSENSITIVE_DIALECTS:
success = _compare_type_string(actual_column_type, expected_type)
observed_value = (
actual_column_type
if isinstance(actual_column_type, str)
else str(actual_column_type)
)
return success, observed_value

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the docstring — fixed. The return value is actual_column_type as-is (typically a CaseInsensitiveString, which is a str subclass). The non-string fallback in _compare_type_string is a defensive path that shouldn't trigger in production since case-insensitive dialects always receive CaseInsensitiveString from get_sqlalchemy_column_metadata. Updated the docstring to be precise about what's returned.

wookasz and others added 2 commits April 10, 2026 13:54
…e return value

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace manual call tracking with mocker.patch and assert_called_once()
for both expectations. Both success and failure tests now consistently:
- Assert the compare function was called exactly once
- Assert execution_engine and expected_type(s) arguments
- Assert the expectation result reflects the mock's return value

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 10, 2026 20:58
Replace assert_called_once + manual call_args inspection with
assert_called_once_with(engine, mocker.ANY, expected_type) for
cleaner, more idiomatic mock assertions. mocker.ANY matches the
actual_column_type argument which is an internal SQLAlchemy type
instance we don't control.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +250 to +258
if BigQueryDialect and (
isinstance(
execution_engine.dialect_module,
BigQueryDialect,
)
and bigquery_types_tuple is not None
):
return bigquery_types_tuple
except (TypeError, AttributeError):

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _get_dialect_type_module(), this BigQuery branch can never be taken as written: BigQueryDialect is imported as the sqlalchemy_bigquery module (see great_expectations/compatibility/bigquery.py), so passing it as the 2nd arg to isinstance() raises TypeError and the condition always falls through. If you still need this patching logic, compare modules directly (e.g., execution_engine.dialect_module.__name__ == "sqlalchemy_bigquery") or import/check against the actual BigQuery dialect class instead of the module.

Suggested change
if BigQueryDialect and (
isinstance(
execution_engine.dialect_module,
BigQueryDialect,
)
and bigquery_types_tuple is not None
):
return bigquery_types_tuple
except (TypeError, AttributeError):
if (
BigQueryDialect
and bigquery_types_tuple is not None
and (
execution_engine.dialect_module is BigQueryDialect
or getattr(execution_engine.dialect_module, "__name__", None)
== getattr(BigQueryDialect, "__name__", None)
)
):
return bigquery_types_tuple
except AttributeError:

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inherited code moved verbatim from expect_column_values_to_be_of_type.py. The try/except (TypeError, AttributeError) around each check is intentional — it handles the cases where dialect_module is a module rather than a class. Cleaning up these branches to use isinstance(dialect_module, type) guards or restructuring the dispatch is valid follow-up work but outside the scope of this extraction PR.

Comment on lines +265 to +266
and issubclass(
execution_engine.dialect_module, # type: ignore[arg-type] # dialect_module can be a class

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Teradata checks appear to be dead code: SqlAlchemyExecutionEngine.dialect_module is set via import_library_module(...) and is therefore a ModuleType, but issubclass(execution_engine.dialect_module, TeradataDialect) will always raise TypeError and be ignored. If the intent is to switch to teradatatypes when the active dialect is Teradata, consider checking execution_engine.engine.dialect (instance) or execution_engine.dialect_module.__name__ instead of using issubclass() on the imported module.

Suggested change
and issubclass(
execution_engine.dialect_module, # type: ignore[arg-type] # dialect_module can be a class
and isinstance(
execution_engine.dialect,

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inherited code moved verbatim from expect_column_values_to_be_of_type.py. The try/except (TypeError, AttributeError) around each check is intentional — it handles the cases where dialect_module is a module rather than a class. Cleaning up these branches to use isinstance(dialect_module, type) guards or restructuring the dispatch is valid follow-up work but outside the scope of this extraction PR.

Comment on lines +278 to +279
and issubclass(
execution_engine.dialect_module, # type: ignore[arg-type] # dialect_module can be a class

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this ClickHouse branch uses issubclass(execution_engine.dialect_module, ClickHouseDialect) but dialect_module is imported as a module, not a class, so this will always TypeError and the ch_types module will never be selected. If you want to return clickhouse_sqlalchemy.types here, consider switching to a module-name check (or check execution_engine.engine.dialect against the dialect class).

Suggested change
and issubclass(
execution_engine.dialect_module, # type: ignore[arg-type] # dialect_module can be a class
and isinstance(
execution_engine.dialect,

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inherited code moved verbatim from expect_column_values_to_be_of_type.py. The try/except (TypeError, AttributeError) around each check is intentional — it handles the cases where dialect_module is a module rather than a class. Cleaning up these branches to use isinstance(dialect_module, type) guards or restructuring the dispatch is valid follow-up work but outside the scope of this extraction PR.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Is this PR still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity.

It will be closed if no further activity occurs. Thank you for your contributions 🙇

@github-actions github-actions Bot added the stale Stale issues and PRs label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale issues and PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants