[FEATURE] Internal Pydantic schemas for ExpectationValidationResult.result (validation-result-schemas)#11869
[FEATURE] Internal Pydantic schemas for ExpectationValidationResult.result (validation-result-schemas)#11869joshua-stauffer wants to merge 20 commits into
Conversation
… git Satisfies requirement 4.3: the default findings path for validation result schema runs (tests/_artifacts/validation_result_schemas/findings/) must not be committed to the repository as test output.
…ates, Finding types (task 2.2)
…__init__ re-exports (task 4.3)
…nit tests (task 5.1)
…her fix (task 7.3)
…expected_count, observed_value fields (task 11.1)
✅ Deploy Preview for niobium-lead-7998 canceled.
|
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #11869 +/- ##
===========================================
- Coverage 84.79% 81.20% -3.60%
===========================================
Files 471 481 +10
Lines 39171 39412 +241
===========================================
- Hits 33217 32005 -1212
- Misses 5954 7407 +1453 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…; add test files to mypy exclude
There was a problem hiding this comment.
Pull request overview
This PR introduces an internal typed “schema layer” over ExpectationValidationResult.result using Pydantic v1 models, plus a dispatcher-based ExpectationValidationResult.as_typed() accessor and a matrix-style integration runner that emits a structured findings JSON artifact (uploaded in CI).
Changes:
- Add internal schema families for map-style and aggregate-style result payloads, plus per-expectation overrides and a dispatcher to select the correct variant.
- Add
ExpectationValidationResult.as_typed(engine_hint=None)to parse (without mutating).resultinto the appropriate typed model. - Add unit/integration tests and a CI artifact upload step for matrix-run findings.
Reviewed changes
Copilot reviewed 28 out of 32 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/core/validation_result_schemas/test_schemas_overrides.py | Unit tests for per-expectation schema override behavior. |
| tests/unit/core/validation_result_schemas/test_schemas_map.py | Unit tests for map-family schema variants and validators. |
| tests/unit/core/validation_result_schemas/test_schemas_aggregate.py | Unit tests for aggregate-family schema variants and observed_value/details shapes. |
| tests/unit/core/validation_result_schemas/test_runner_helpers.py | Unit tests for matrix-runner helper utilities (coverage assertion, summarization, engine normalization). |
| tests/unit/core/validation_result_schemas/test_format_config.py | Unit tests for internal ResultFormatConfig TypedDict expectations around parse_result_format(). |
| tests/unit/core/validation_result_schemas/test_findings_emitter.py | Unit tests for findings JSON emission (determinism, env var, atomic writes). |
| tests/unit/core/validation_result_schemas/test_field_validators.py | Unit tests for shared validator functions (runtime type classification, root validator behavior). |
| tests/unit/core/validation_result_schemas/test_dispatcher.py | Unit tests validating dispatcher routing (family, formats, overrides, ParseError behavior). |
| tests/unit/core/validation_result_schemas/test_cases_table.py | Unit tests asserting EXPECTATION_CASES integrity vs core expectations. |
| tests/unit/core/validation_result_schemas/test_as_typed.py | Unit tests for EVR.as_typed() correctness and non-mutation guarantees. |
| tests/unit/core/validation_result_schemas/init.py | Package init to support the new unit-test module layout. |
| tests/unit/core/init.py | Package init to support unit-test module imports. |
| tests/unit/init.py | Package init to support unit-test module imports. |
| tests/integration/data_sources_and_expectations/expectations/test_validation_result_schemas_matrix.py | Integration matrix runner that validates schema parsing across expectations/engines/formats and writes findings. |
| tests/integration/data_sources_and_expectations/expectations/_validation_result_schemas_helpers.py | Shared helper functions for the matrix runner (engine normalization, coverage checks, summarization). |
| tests/integration/data_sources_and_expectations/expectations/_validation_result_schemas_cases.py | Canonical case table (one entry per core expectation) feeding the matrix runner. |
| tests/conftest.py | Adds --vrs-run-id CLI option for naming findings output. |
| pyproject.toml | Registers no_xdist marker and updates mypy excludes for selected new tests. |
| great_expectations/core/validation_result_schemas/types.py | Adds internal enums and TypedDicts for findings metadata/types. |
| great_expectations/core/validation_result_schemas/schemas/per_expectation_overrides.py | Adds override schema(s) for engine-specific divergences. |
| great_expectations/core/validation_result_schemas/schemas/map_result.py | Adds map-family Pydantic models and validator wiring. |
| great_expectations/core/validation_result_schemas/schemas/aggregate_result.py | Adds aggregate-family Pydantic models. |
| great_expectations/core/validation_result_schemas/schemas/init.py | Re-exports schema models for internal consumption. |
| great_expectations/core/validation_result_schemas/format_config.py | Adds ResultFormatConfig TypedDict used by internal dispatch logic. |
| great_expectations/core/validation_result_schemas/findings_emitter.py | Adds deterministic, atomic JSON findings writer with env/dir resolution. |
| great_expectations/core/validation_result_schemas/field_validators.py | Adds reusable validator helpers shared across schema families. |
| great_expectations/core/validation_result_schemas/dispatcher.py | Adds as_typed() dispatcher, family table, override table, and ParseError wrapping. |
| great_expectations/core/validation_result_schemas/init.py | Re-exports dispatcher entrypoints for the internal package. |
| great_expectations/core/expectation_validation_result.py | Adds ExpectationValidationResult.as_typed(engine_hint=None). |
| .gitignore | Ignores tests/_artifacts/ findings output directory. |
| .github/workflows/ci.yml | Uploads findings artifact in CI jobs (always()). |
| .github/actions/upload-validation-result-schemas-findings/action.yml | Composite action to upload tests/_artifacts/validation_result_schemas/findings/ as a CI artifact. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 1. environment variable GX_VALIDATION_FINDINGS_DIR if set | ||
| 2. else _DEFAULT_DIR (gitignored in the gx repo) |
| # We can't easily test the true default without writing to the actual filesystem, | ||
| # so we verify that FindingsWriter resolves to _DEFAULT_DIR by checking | ||
| # the resolved path stored on the instance. | ||
| with patch("os.makedirs"): # prevent actual dir creation |
| def as_typed(self, *, engine_hint: Optional[str] = None): | ||
| """Return a typed view of self.result without mutating anything. | ||
|
|
||
| Lazy-imports the dispatcher to avoid an import cycle at module load. | ||
| Reads expectation_type from self.expectation_config.type and ResultFormat | ||
| from self.expectation_config.kwargs.get('result_format', DEFAULT_RESULT_FORMAT). | ||
| Returns the parsed model. Raises ParseError on validation failure. | ||
|
|
||
| engine_hint: optional 'pandas' | 'spark' | 'sql'. When supplied, the | ||
| dispatcher uses it directly. When None, the dispatcher sniffs from the | ||
| result dict shape. |
| cannot be validated; they produce ``status=failed`` findings and the corresponding | ||
| test cells are marked as failures — this is expected and documented here. |
| def _generate_run_id() -> str: | ||
| """Generate a time-stamped run ID when ``--vrs-run-id`` is not supplied.""" | ||
| ts = datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%dT%H-%M-%SZ") | ||
| suffix = "".join(random.choices(string.ascii_lowercase + string.digits, k=6)) | ||
| return f"{ts}-{suffix}" |
| try: | ||
| raw_evr = batch_for_datasource.validate( | ||
| case.expectation, | ||
| result_format=result_format, # type: ignore[arg-type] | ||
| ) | ||
| except Exception as exc: | ||
| _findings_writer.write_finding( | ||
| { | ||
| "expectation_type": expectation_type, | ||
| "result_format": result_format.value, | ||
| "engine": engine_hint, | ||
| "datasource_test_id": datasource_test_id, | ||
| "status": Status.FAILED.value, | ||
| "error_summary": f"batch.validate raised: {type(exc).__name__}: {exc}", | ||
| } | ||
| ) | ||
| pytest.fail( | ||
| f"[{case.id}][{result_format.value}][{engine_hint}]: " | ||
| f"batch.validate raised {type(exc).__name__}: {exc}" | ||
| ) |
| # values dict during root validation. ``exclude=True`` is not used here | ||
| # because pydantic v1's per-field exclude is Config-based; callers that want | ||
| # to omit this field from .dict() output should call .dict(exclude={"engine_hint"}). |
Summary
This PR adds an internal typed layer over
ExpectationValidationResult.result, providing:MapResult,AggregateResult, per-expectation overrides) covering the(ResultFormat × engine × core_expectation)divergence space for all ~61 core expectationsEVR.as_typed(engine_hint=None)accessor that parses theresultdict into the matching schema variant without mutating anythingtest_validation_result_schemas_matrix.py) that walks every(core_expectation × engine × ResultFormat)cell against canonical fixtures, asserts conformance, and emits a structured JSON findings artifactactions/upload-artifact@v4so findings from all backends are retrievable bygh run downloadThis is internal-only work on
v1. No@public_apisymbols added, no marshmallow change, no serialization change, nogreat_expectations/__init__.pyre-export. Existing consumers of.resultare unaffected.Non-breaking guarantees
EVR.resultcontinues to beDict[str, Any]— no change for callers that don't opt in toas_typedto_json_dict()output is byte-identical before and after callingas_typed()ExpectationValidationResultSchema(marshmallow) is untouchedMatrix runner findings
The pandas-only run (440 cells: 61 cases × 4 ResultFormats × 2 pandas datasources) passes with 0 failures after schema gap fixes. The full matrix (2880 cells) runs on all 12 configured backends; cloud-credentialed backends are skipped locally but CI shards cover them.
Known divergences surfaced by the matrix (queued for v2 reconciliation):
expect_column_distinct_values_to_be_in_set/expect_column_distinct_values_to_equal_set: classified as aggregate but emit map-style fields (unexpected_count,partial_unexpected_list, etc.)expect_column_values_to_be_of_type/expect_column_values_to_be_in_type_list: classified as map but only returnobserved_valueon SQL/SparkThese 4 expectations represent intentional cross-engine divergences that v2 will reconcile. They are logged in the findings JSON as
status=failedentries — exactly the intended output of this spec.Findings artifact
Each CI shard uploads its findings to
validation-result-schemas-findings(via.github/actions/upload-validation-result-schemas-findings/action.yml). The curator retrieves them with:See
docs/spec-v2/validation-result-schemas-handoff-template.mdin the gx_maintainer repo for the full curation workflow.Related
.kiro/specs/validation-result-schemas/