Skip to content

Conversation

@vertti
Copy link
Collaborator

@vertti vertti commented Nov 22, 2025

Adds row_validator parameter to df_in and df_out decorators for validating actual data values using Pydantic models (>= 2.4.0).

Column validation remains lightweight. Row validation uses batch processing for optimal performance.

- Create pydantic_types.py module for optional Pydantic imports
- Add version check requiring Pydantic >= 2.4.0 for TypeAdapter
- Include BaseModel, ValidationError, ConfigDict, TypeAdapter exports
- Add comprehensive tests for availability detection
- Follow same pattern as dataframe_types.py for consistency
Implements efficient row validation using Pydantic's TypeAdapter for batch
processing. Adds type detection helpers to dataframe_types for cleaner code.
Adds max_errors and convert_nans config options to pyproject.toml,
following the existing config pattern for consistency.
Enables row-level validation using Pydantic models on input DataFrames,
with configuration support for max_errors and convert_nans.
Enables row-level validation using Pydantic models on return DataFrames,
with configuration support for max_errors and convert_nans.
Maintains focus on lightweight column validation while introducing
row validation as an optional feature. Includes clear performance
guidance: column validation is fast, row validation has overhead.
Adds detailed documentation for row validation feature including:
- Basic usage with Pydantic models
- Performance considerations (lightweight column validation vs data-validating row validation)
- Error message examples
- Combined column and row validation
- Return value validation with @df_out
- Configuration options in pyproject.toml
- Advanced Pydantic features (validators, ConfigDict)
- Remove conditional logic from test_pydantic_types.py since pydantic is always in dev deps
- Add pandas-no-pydantic scenario to CI isolation tests
- Verifies column validation works without pydantic
- Ensures row validation raises appropriate error when pydantic unavailable
- Tests are now properly split: regular tests assume all deps, isolation tests verify optional deps
Add optional row-level validation using Pydantic models
Required for row validation tests to run in CI
@vertti vertti force-pushed the feat/row-validation branch from 80d0aec to ebce903 Compare November 22, 2025 19:39
- Exclude pydantic_types.py from coverage (optional dependency module)
- Add test for unknown DataFrame type error path
- Add tests for iterative validation fallback with both Pandas and Polars
- Ensures all important code paths are tested, not just hitting coverage numbers
@vertti vertti force-pushed the feat/row-validation branch from ebce903 to b39e30c Compare November 22, 2025 19:42
- Remove verbose/obvious comments
- Simplify docstrings to be concise
- Make _convert_nan_to_none a one-liner dict comprehension
- Remove unnecessary explanatory comments
- More pythonic and easier to read
@vertti vertti merged commit 3384cb4 into ThoughtWorksInc:master Nov 22, 2025
20 checks passed
@vertti vertti deleted the feat/row-validation branch November 22, 2025 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant