[MAINTENANCE] Extract type comparison logic into dedicated module#11798
[MAINTENANCE] Extract type comparison logic into dedicated module#11798wookasz wants to merge 28 commits into
Conversation
Move dialect-aware type comparison logic from the two type expectations into a new `type_comparison` module with a clean public API: - `compare_column_type()` and `compare_column_type_list()` handle dialect dispatch internally (case-insensitive string comparison for 5 dialects, isinstance-based class matching for all others) - `native_type_type_map()` for pandas type resolution - `CASE_INSENSITIVE_DIALECTS` constant (eliminates duplicate in metrics/util.py) Both expectations now delegate to the module, and all consumers import directly from type_comparison rather than from the expectation files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for niobium-lead-7998 canceled.
|
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #11798 +/- ##
===========================================
+ Coverage 84.67% 84.68% +0.01%
===========================================
Files 471 472 +1
Lines 39179 39199 +20
===========================================
+ Hits 33174 33197 +23
+ Misses 6005 6002 -3 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR extracts dialect-aware SQLAlchemy column-type comparison logic (used by ExpectColumnValuesToBeOfType and ExpectColumnValuesToBeInTypeList) into a new type_comparison module, and updates expectations/metrics to delegate to that shared implementation.
Changes:
- Added
great_expectations.expectations.core.type_comparisonwith shared helpers (compare_column_type,compare_column_type_list,native_type_type_map,CASE_INSENSITIVE_DIALECTS). - Updated both type expectations and relevant column-map metrics to use the new module (removing duplicated/embedded logic).
- Updated metrics column metadata handling to reuse the shared
CASE_INSENSITIVE_DIALECTSconstant and added a new unit test suite for the module.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/expectations/core/test_type_comparison.py | Adds unit tests for the new type comparison module. |
| great_expectations/expectations/metrics/util.py | Reuses shared CASE_INSENSITIVE_DIALECTS instead of a local duplicate. |
| great_expectations/expectations/metrics/column_map_metrics/column_values_of_type.py | Switches pandas native type mapping to native_type_type_map(). |
| great_expectations/expectations/metrics/column_map_metrics/column_values_in_type_list.py | Switches pandas native type mapping to native_type_type_map(). |
| great_expectations/expectations/core/type_comparison.py | New consolidated module implementing dialect-aware type comparisons. |
| great_expectations/expectations/core/expect_column_values_to_be_of_type.py | Delegates SQLAlchemy type comparison and native type mapping to type_comparison. |
| great_expectations/expectations/core/expect_column_values_to_be_in_type_list.py | Delegates SQLAlchemy type-list comparison and native type mapping to type_comparison. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # CaseInsensitiveString objects will automatically do case-insensitive comparison | ||
| return actual_column_type == expected_type | ||
| return str(actual_column_type).lower() == expected_type.lower() |
There was a problem hiding this comment.
_compare_type_string() claims to do case-insensitive comparison, but for plain str it currently uses actual_column_type == expected_type (case-sensitive). This is inconsistent with compare_column_type_list() (which lowercases) and with the module/docstring intent. Consider normalizing both sides (e.g., casefold()/lower()) for plain strings while still allowing CaseInsensitiveString.__eq__ to handle its own semantics.
| # CaseInsensitiveString objects will automatically do case-insensitive comparison | |
| return actual_column_type == expected_type | |
| return str(actual_column_type).lower() == expected_type.lower() | |
| # Preserve custom __eq__ behavior for str subclasses such as | |
| # CaseInsensitiveString, but normalize plain str values explicitly. | |
| if type(actual_column_type) is str: | |
| return actual_column_type.casefold() == expected_type.casefold() | |
| return actual_column_type == expected_type | |
| return str(actual_column_type).casefold() == expected_type.casefold() |
| execution_engine: SqlAlchemyExecutionEngine, | ||
| ) -> ModuleType: | ||
| if execution_engine.dialect_module is None: | ||
| logger.warning("No sqlalchemy dialect found; relying in top-level sqlalchemy types.") |
There was a problem hiding this comment.
Log message has a grammatical typo: "relying in top-level sqlalchemy types" should be "relying on top-level sqlalchemy types".
| logger.warning("No sqlalchemy dialect found; relying in top-level sqlalchemy types.") | |
| logger.warning("No sqlalchemy dialect found; relying on top-level sqlalchemy types.") |
| def _mock_engine(mocker, dialect_name): | ||
| """Create a mock SqlAlchemyExecutionEngine with the given dialect_name.""" | ||
| engine = mocker.MagicMock() | ||
| type(engine).dialect_name = mocker.PropertyMock(return_value=dialect_name) |
There was a problem hiding this comment.
_mock_engine() sets type(engine).dialect_name using PropertyMock, which mutates the shared MagicMock class and can leak between tests. Prefer setting engine.dialect_name = <value> on the instance, or use a small stub class/object with a dialect_name attribute to avoid global side effects.
| type(engine).dialect_name = mocker.PropertyMock(return_value=dialect_name) | |
| engine.dialect_name = dialect_name |
| mock_type = mocker.MagicMock() | ||
| mock_type.__str__ = lambda self: "INTEGER" | ||
| type(mock_type).__name__ = "INTEGER" |
There was a problem hiding this comment.
These tests mutate the MagicMock class by assigning type(mock_type).__name__ = "INTEGER", which can affect other mocks and make the suite order-dependent. Use a dedicated dummy type/class instance (e.g., a small local class with __name__ and __str__) instead of changing the global mock class metadata.
| mock_type = mocker.MagicMock() | |
| mock_type.__str__ = lambda self: "INTEGER" | |
| type(mock_type).__name__ = "INTEGER" | |
| class INTEGER: | |
| def __str__(self): | |
| return "INTEGER" | |
| mock_type = INTEGER() |
| @pytest.mark.parametrize("dialect", list(CASE_INSENSITIVE_DIALECTS)) | ||
| def test_case_insensitive_match(self, mocker, dialect): | ||
| engine = _mock_engine(mocker, dialect) | ||
| success, _observed = compare_column_type(engine, "integer", "integer") |
There was a problem hiding this comment.
test_case_insensitive_match doesn't currently assert mixed-case behavior (it compares "integer" to "integer"). To verify the case-insensitive path, use differing cases (e.g., actual "INTEGER" vs expected "integer") and/or use CaseInsensitiveString as the observed value since that's what production code wraps types with for these dialects.
| success, _observed = compare_column_type(engine, "integer", "integer") | |
| success, _observed = compare_column_type(engine, "INTEGER", "integer") |
|
|
||
| # --------------------------------------------------------------------------- | ||
| # compare_column_type_list — isinstance path | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class TestCompareColumnTypeListByClass: | ||
| def test_match_in_list(self, mocker): | ||
| engine = _mock_engine(mocker, "sqlite") |
There was a problem hiding this comment.
Same issue as above: this test mutates type(mock_type).__name__, which can leak globally and cause flaky failures. Prefer a purpose-built dummy class/instance for the non-string type case.
- Fix _compare_type_string to use casefold() for plain str while preserving CaseInsensitiveString.__eq__ for subclasses - Fix typo: "relying in" -> "relying on" - Replace MagicMock with lightweight stub classes to avoid mutating shared mock class metadata between tests - Fix test_case_insensitive_match to actually test mixed case Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def native_type_type_map(type_: str) -> tuple | None: # noqa: C901, PLR0911 | ||
| """Map a string type name to a tuple of native Python types. | ||
|
|
||
| Used by pandas validation paths to resolve type names like "int", "str", etc. | ||
| Returns None for unrecognized types. | ||
| """ | ||
| if type_.lower() == "none": | ||
| return (type(None),) | ||
| elif type_.lower() == "bool": | ||
| return (bool,) | ||
| elif type_.lower() in ["int", "long"]: | ||
| return (int,) | ||
| elif type_.lower() == "float": | ||
| return (float,) | ||
| elif type_.lower() == "bytes": | ||
| return (bytes,) | ||
| elif type_.lower() == "complex": | ||
| return (complex,) | ||
| elif type_.lower() in ["str", "string_types"]: | ||
| return (str,) | ||
| elif type_.lower() == "list": | ||
| return (list,) | ||
| elif type_.lower() == "dict": | ||
| return (dict,) | ||
| elif type_.lower() == "unicode": | ||
| return None | ||
|
|
There was a problem hiding this comment.
native_type_type_map() has an imprecise return type (tuple | None) and falls off the end implicitly for unknown types. Since this is part of the new public API, tighten the signature (e.g., tuple[type, ...] | None) and add an explicit return None for unrecognized values to make behavior and typing unambiguous.
| actual_column_type.lower() == expected_type.lower() | ||
| for expected_type in expected_types_list | ||
| ) | ||
| return success, actual_column_type | ||
| else: | ||
| ret_type = type(actual_column_type).__name__ | ||
| success = any( | ||
| ret_type.lower() == expected_type.lower() for expected_type in expected_types_list |
There was a problem hiding this comment.
Case-insensitive comparisons are implemented with .casefold() in _compare_type_string(), but compare_column_type_list() uses .lower(). Since this module is meant to centralize dialect-aware comparisons (and already notes Unicode correctness), it would be more consistent/robust to use .casefold() in the list path as well.
| actual_column_type.lower() == expected_type.lower() | |
| for expected_type in expected_types_list | |
| ) | |
| return success, actual_column_type | |
| else: | |
| ret_type = type(actual_column_type).__name__ | |
| success = any( | |
| ret_type.lower() == expected_type.lower() for expected_type in expected_types_list | |
| actual_column_type.casefold() == expected_type.casefold() | |
| for expected_type in expected_types_list | |
| ) | |
| return success, actual_column_type | |
| else: | |
| ret_type = type(actual_column_type).__name__ | |
| success = any( | |
| ret_type.casefold() == expected_type.casefold() | |
| for expected_type in expected_types_list |
Add comprehensive unit tests covering actual type names for each dialect: - PostgreSQL: CHAR, TEXT, VARCHAR, INTEGER, SMALLINT, BIGINT, TIMESTAMP, DATE, DOUBLE PRECISION, BOOLEAN, NUMERIC + case insensitivity - Snowflake: STRING, TEXT, CHARACTER, BYTEINT, TINYINT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL(38,0), FIXED, DEC, NUMBER, DATE, TIMESTAMP_NTZ, TIMESTAMP_LTZ, TIMESTAMP_TZ, TIME, VARIANT, VARBINARY, GEOGRAPHY, GEOMETRY - Databricks: STRING, INT, BIGINT, SMALLINT, TINYINT, BOOLEAN, FLOAT, DOUBLE, DECIMAL, DATE, TIMESTAMP, TIMESTAMP_NTZ - SQL Server: INTEGER, BIGINT, SMALLINT, FLOAT, REAL, NUMERIC, DECIMAL, VARCHAR, NVARCHAR, CHAR, BIT, DATE, DATETIME, DATETIME2 - Trino: INTEGER, BIGINT, SMALLINT, TINYINT, DOUBLE, DECIMAL, VARCHAR, CHAR, BOOLEAN, DATE, TIMESTAMP - SQLite: isinstance path with sa.types for all common types + list tests - MySQL: isinstance path with sa.types for all common types - CaseInsensitiveString: quoted vs unquoted behavior in both scalar and list comparison paths Total: 177 unit tests (up from 63). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
…ar and list Rewrite test suite to independently test both compare_column_type (scalar) and compare_column_type_list (list) for every dialect and type — no assumption that they share implementation. Dialect coverage (scalar + list for each type): - PostgreSQL: 19 types (CHAR through INTERVAL) - Snowflake: 29 types (STRING through BOOLEAN, incl DECIMAL variants) - Databricks: 17 types (STRING through STRUCT) - SQL Server: 27 types (INTEGER through XML) - Trino: 20 types (INTEGER through ROW) - SQLite: 14 types via sa.types isinstance path - MySQL: 14 types via sa.types isinstance path - CaseInsensitiveString: quoted/unquoted behavior tested independently for both scalar and list paths - Non-string type fallback: tested for all case-insensitive dialects Total: 831 unit tests (up from 177). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
The module serves both expectations/core/ and expectations/metrics/ equally — it's shared infrastructure, not specific to core. Moving it to expectations/type_comparison.py gives a cleaner import path and avoids metrics importing from expectations.core. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Test file now lives at tests/expectations/test_type_comparison.py, mirroring the module at expectations/type_comparison.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sort CASE_INSENSITIVE_DIALECTS parametrize values for deterministic test ID ordering across pytest-xdist workers - Add explicit return None to native_type_type_map for mypy - Relax _get_redshift_sqlalchemy_types return type annotation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
…stently - Tighten native_type_type_map return type to tuple[type, ...] | None - Use casefold() instead of lower() in compare_column_type_list for consistency with the scalar path and better Unicode handling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g dialect tests - Add comment explaining why CASE_INSENSITIVE_DIALECTS uses GXSqlDialect enum members (GXSqlDialect.__eq__ handles cross-type str comparison) - Add test_string_membership_via_gxsqldialect_eq to prove str 'in' checks work against the frozenset - Add TestStringDialectName class exercising both compare_column_type and compare_column_type_list with plain string dialect names (as returned by SqlAlchemyExecutionEngine.dialect_name in production) Total: 846 tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import teradatasqlalchemy.dialect | ||
| import teradatasqlalchemy.types as teradatatypes | ||
| except ImportError: | ||
| teradatasqlalchemy = None |
There was a problem hiding this comment.
In the teradatasqlalchemy import fallback, teradatatypes is not set when the import fails. _get_dialect_type_module() later references teradatatypes, which will raise NameError in environments without teradatasqlalchemy installed. Set teradatatypes = None in the except ImportError block (mirrors patterns used elsewhere in the repo).
| teradatasqlalchemy = None | |
| teradatasqlalchemy = None | |
| teradatatypes = None |
| def test_list_uses_lower_not_eq(self): | ||
| """The type-list path uses .lower() comparison, so quoted semantics |
There was a problem hiding this comment.
This test/docstring says the type-list path uses .lower() comparison, but the implementation in compare_column_type_list() now uses .casefold() for comparison. Please update the wording to match the current behavior (or adjust implementation if .lower() is intended).
| def test_list_uses_lower_not_eq(self): | |
| """The type-list path uses .lower() comparison, so quoted semantics | |
| def test_list_uses_casefold_not_eq(self): | |
| """The type-list path uses .casefold() comparison, so quoted semantics |
| """Unit tests for great_expectations.expectations.type_comparison. | ||
|
|
||
| Every dialect tests BOTH compare_column_type (scalar) and | ||
| compare_column_type_list (list) independently — no assumption is made | ||
| that they share an implementation. | ||
| """ |
There was a problem hiding this comment.
PR description/test plan references pytest tests/expectations/core/test_type_comparison.py, but the added unit tests are in tests/expectations/test_type_comparison.py. Please update the PR description (or move/rename the test file) so the documented command matches the repo layout.
Guard teradatasqlalchemy and clickhouse_sqlalchemy with explicit 'is not None' checks before issubclass() calls so mypy doesn't see attribute access on potentially-None modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set teradatatypes = None in import fallback to match clickhouse pattern and prevent potential NameError - Update test docstring from .lower() to .casefold() after consistency fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dialect_module is typed as Module but at runtime can be a dialect class for Teradata and ClickHouse. The try/except TypeError handles the case where it's not a class. This matches the original code's behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Unit tests for great_expectations.expectations.type_comparison. | ||
|
|
||
| Every dialect tests BOTH compare_column_type (scalar) and | ||
| compare_column_type_list (list) independently — no assumption is made | ||
| that they share an implementation. | ||
| """ |
There was a problem hiding this comment.
PR description/test plan references running pytest tests/expectations/core/test_type_comparison.py, but the new unit tests are added at tests/expectations/test_type_comparison.py. This mismatch will likely confuse CI/local runs; please either update the PR description/test plan or move the test file into the documented location (and adjust any conventions for this repo’s expectations tests accordingly).
There was a problem hiding this comment.
Already fixed — the PR description was updated in a prior commit to reference the correct path (tests/expectations/test_type_comparison.py).
| # Apply case-insensitive formatting for specific dialects | ||
| from great_expectations.expectations.type_comparison import CASE_INSENSITIVE_DIALECTS | ||
|
|
||
| dialect_name = execution_engine.dialect.name | ||
| case_insensitive_dialects = { | ||
| GXSqlDialect.DATABRICKS, | ||
| GXSqlDialect.POSTGRESQL, | ||
| GXSqlDialect.SNOWFLAKE, | ||
| GXSqlDialect.SQL_SERVER, | ||
| GXSqlDialect.TRINO, | ||
| } | ||
| if dialect_name in case_insensitive_dialects: | ||
| if dialect_name in CASE_INSENSITIVE_DIALECTS: |
There was a problem hiding this comment.
Importing CASE_INSENSITIVE_DIALECTS from great_expectations.expectations.type_comparison here pulls in a module that performs optional third-party imports at import time (e.g., teradatasqlalchemy/clickhouse). Since this helper is on a hot path for column metadata reflection, consider relocating CASE_INSENSITIVE_DIALECTS to a lightweight module (e.g., sqlalchemy_dialect) or making those optional imports lazy inside the helper functions that require them, to avoid unnecessary import overhead/side effects.
There was a problem hiding this comment.
The import in metrics/util.py is already deferred (inside the function body, not at module level), so it doesn't affect the hot path. The try/except imports of teradatasqlalchemy and clickhouse in type_comparison.py are cheap no-ops when packages aren't installed — they just set variables to None. Moving CASE_INSENSITIVE_DIALECTS to sqlalchemy_dialect.py would reduce coupling but would also separate the constant from the comparison logic it governs. Not worth the trade-off for this PR.
…tion Verify that both type expectations correctly delegate to the type_comparison module through their public validator API: ExpectColumnValuesToBeOfType: - isinstance path (SQLite native): success + failure - case-insensitive path (all 5 dialects): success + failure ExpectColumnValuesToBeInTypeList: - isinstance path (SQLite native): success + failure - case-insensitive path (all 5 dialects): success + failure These tests ensure a refactoring mistake that breaks the wiring between the expectations and compare_column_type/compare_column_type_list would be caught, without depending on the module's internal unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
…ed delegation tests The previous wiring tests proved end-to-end results were correct but didn't prove the expectations actually delegate to compare_column_type / compare_column_type_list. Someone could reimplement the logic inline and the tests would still pass. New tests monkeypatch the compare functions with controlled return values and assert: - The function was called exactly once - It received the correct execution_engine and expected_type arguments - The expectation result reflects the patched return value (success and observed_value), not independently computed logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # bigquery geography requires installing an extra package | ||
| if ( | ||
| expected_type.lower() == "geography" | ||
| and execution_engine.engine.dialect.name.lower() == GXSqlDialect.BIGQUERY |
There was a problem hiding this comment.
The BigQuery dialect check compares a lowercase dialect name string to GXSqlDialect.BIGQUERY (an Enum). Since the string is on the left-hand side, this comparison will always be False and the GEOGRAPHY support warning branch will never run. Compare against GXSqlDialect.BIGQUERY.value (or invert the comparison / use execution_engine.dialect_name == GXSqlDialect.BIGQUERY) so the condition can be reached.
| and execution_engine.engine.dialect.name.lower() == GXSqlDialect.BIGQUERY | |
| and execution_engine.engine.dialect.name.lower() == GXSqlDialect.BIGQUERY.value |
There was a problem hiding this comment.
This comparison works correctly — Python's str.__eq__ returns NotImplemented for non-str types, which causes Python to fall back to GXSqlDialect.__eq__ on the right-hand side. Verified locally: 'bigquery' == GXSqlDialect.BIGQUERY evaluates to True. This is inherited code with the same behavior as the original.
| (success, observed_value) where observed_value is the raw type string | ||
| for case-insensitive dialects or type(actual_column_type).__name__ otherwise. | ||
| """ | ||
| if execution_engine.dialect_name in CASE_INSENSITIVE_DIALECTS: | ||
| success = _compare_type_string(actual_column_type, expected_type) | ||
| return success, actual_column_type |
There was a problem hiding this comment.
For case-insensitive dialects, compare_column_type() always returns actual_column_type as observed_value even when it's not a string (the comparison falls back to str(actual_column_type) but the return value does not). This contradicts the docstring (“raw type string”) and can yield a non-serializable object in the expectation result. Consider returning str(actual_column_type) (or the computed normalized string) when actual_column_type is not a str/CaseInsensitiveString.
| (success, observed_value) where observed_value is the raw type string | |
| for case-insensitive dialects or type(actual_column_type).__name__ otherwise. | |
| """ | |
| if execution_engine.dialect_name in CASE_INSENSITIVE_DIALECTS: | |
| success = _compare_type_string(actual_column_type, expected_type) | |
| return success, actual_column_type | |
| (success, observed_value) where observed_value is the type string | |
| for case-insensitive dialects or type(actual_column_type).__name__ otherwise. | |
| """ | |
| if execution_engine.dialect_name in CASE_INSENSITIVE_DIALECTS: | |
| success = _compare_type_string(actual_column_type, expected_type) | |
| observed_value = ( | |
| actual_column_type | |
| if isinstance(actual_column_type, str) | |
| else str(actual_column_type) | |
| ) | |
| return success, observed_value |
There was a problem hiding this comment.
Good catch on the docstring — fixed. The return value is actual_column_type as-is (typically a CaseInsensitiveString, which is a str subclass). The non-string fallback in _compare_type_string is a defensive path that shouldn't trigger in production since case-insensitive dialects always receive CaseInsensitiveString from get_sqlalchemy_column_metadata. Updated the docstring to be precise about what's returned.
…e return value Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace manual call tracking with mocker.patch and assert_called_once() for both expectations. Both success and failure tests now consistently: - Assert the compare function was called exactly once - Assert execution_engine and expected_type(s) arguments - Assert the expectation result reflects the mock's return value Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace assert_called_once + manual call_args inspection with assert_called_once_with(engine, mocker.ANY, expected_type) for cleaner, more idiomatic mock assertions. mocker.ANY matches the actual_column_type argument which is an internal SQLAlchemy type instance we don't control. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if BigQueryDialect and ( | ||
| isinstance( | ||
| execution_engine.dialect_module, | ||
| BigQueryDialect, | ||
| ) | ||
| and bigquery_types_tuple is not None | ||
| ): | ||
| return bigquery_types_tuple | ||
| except (TypeError, AttributeError): |
There was a problem hiding this comment.
In _get_dialect_type_module(), this BigQuery branch can never be taken as written: BigQueryDialect is imported as the sqlalchemy_bigquery module (see great_expectations/compatibility/bigquery.py), so passing it as the 2nd arg to isinstance() raises TypeError and the condition always falls through. If you still need this patching logic, compare modules directly (e.g., execution_engine.dialect_module.__name__ == "sqlalchemy_bigquery") or import/check against the actual BigQuery dialect class instead of the module.
| if BigQueryDialect and ( | |
| isinstance( | |
| execution_engine.dialect_module, | |
| BigQueryDialect, | |
| ) | |
| and bigquery_types_tuple is not None | |
| ): | |
| return bigquery_types_tuple | |
| except (TypeError, AttributeError): | |
| if ( | |
| BigQueryDialect | |
| and bigquery_types_tuple is not None | |
| and ( | |
| execution_engine.dialect_module is BigQueryDialect | |
| or getattr(execution_engine.dialect_module, "__name__", None) | |
| == getattr(BigQueryDialect, "__name__", None) | |
| ) | |
| ): | |
| return bigquery_types_tuple | |
| except AttributeError: |
There was a problem hiding this comment.
This is inherited code moved verbatim from expect_column_values_to_be_of_type.py. The try/except (TypeError, AttributeError) around each check is intentional — it handles the cases where dialect_module is a module rather than a class. Cleaning up these branches to use isinstance(dialect_module, type) guards or restructuring the dispatch is valid follow-up work but outside the scope of this extraction PR.
| and issubclass( | ||
| execution_engine.dialect_module, # type: ignore[arg-type] # dialect_module can be a class |
There was a problem hiding this comment.
These Teradata checks appear to be dead code: SqlAlchemyExecutionEngine.dialect_module is set via import_library_module(...) and is therefore a ModuleType, but issubclass(execution_engine.dialect_module, TeradataDialect) will always raise TypeError and be ignored. If the intent is to switch to teradatatypes when the active dialect is Teradata, consider checking execution_engine.engine.dialect (instance) or execution_engine.dialect_module.__name__ instead of using issubclass() on the imported module.
| and issubclass( | |
| execution_engine.dialect_module, # type: ignore[arg-type] # dialect_module can be a class | |
| and isinstance( | |
| execution_engine.dialect, |
There was a problem hiding this comment.
This is inherited code moved verbatim from expect_column_values_to_be_of_type.py. The try/except (TypeError, AttributeError) around each check is intentional — it handles the cases where dialect_module is a module rather than a class. Cleaning up these branches to use isinstance(dialect_module, type) guards or restructuring the dispatch is valid follow-up work but outside the scope of this extraction PR.
| and issubclass( | ||
| execution_engine.dialect_module, # type: ignore[arg-type] # dialect_module can be a class |
There was a problem hiding this comment.
Similarly, this ClickHouse branch uses issubclass(execution_engine.dialect_module, ClickHouseDialect) but dialect_module is imported as a module, not a class, so this will always TypeError and the ch_types module will never be selected. If you want to return clickhouse_sqlalchemy.types here, consider switching to a module-name check (or check execution_engine.engine.dialect against the dialect class).
| and issubclass( | |
| execution_engine.dialect_module, # type: ignore[arg-type] # dialect_module can be a class | |
| and isinstance( | |
| execution_engine.dialect, |
There was a problem hiding this comment.
This is inherited code moved verbatim from expect_column_values_to_be_of_type.py. The try/except (TypeError, AttributeError) around each check is intentional — it handles the cases where dialect_module is a module rather than a class. Cleaning up these branches to use isinstance(dialect_module, type) guards or restructuring the dispatch is valid follow-up work but outside the scope of this extraction PR.
|
Is this PR still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions 🙇 |
Summary
ExpectColumnValuesToBeOfTypeandExpectColumnValuesToBeInTypeListinto a newtype_comparisonmodule atgreat_expectations/expectations/type_comparison.pycompare_column_type(),compare_column_type_list(),native_type_type_map(), andCASE_INSENSITIVE_DIALECTSconstant_validate_sqlalchemy()to 4 lines eachmetrics/util.py) import directly fromtype_comparison— no re-exportscase_insensitive_dialectsset inmetrics/util.pyContext
This is Phase 1 of a two-phase effort to reduce Databricks/Snowflake CI runtime. By extracting and unit-testing the type comparison logic, Phase 2 can safely narrow most expectation integration tests to Postgres-only, since dialect-specific behavior is now covered by unit tests and metric-level integration tests.
Test plan
pytest tests/expectations/test_type_comparison.pypytest tests/expectations/core/test_expect_column_values_to_be_of_type.py tests/expectations/core/test_expect_column_values_to_be_in_type_list.py