-
Notifications
You must be signed in to change notification settings - Fork 6
Add optional row-level validation with Pydantic #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Create pydantic_types.py module for optional Pydantic imports - Add version check requiring Pydantic >= 2.4.0 for TypeAdapter - Include BaseModel, ValidationError, ConfigDict, TypeAdapter exports - Add comprehensive tests for availability detection - Follow same pattern as dataframe_types.py for consistency
Implements efficient row validation using Pydantic's TypeAdapter for batch processing. Adds type detection helpers to dataframe_types for cleaner code.
Adds max_errors and convert_nans config options to pyproject.toml, following the existing config pattern for consistency.
Enables row-level validation using Pydantic models on input DataFrames, with configuration support for max_errors and convert_nans.
Enables row-level validation using Pydantic models on return DataFrames, with configuration support for max_errors and convert_nans.
Maintains focus on lightweight column validation while introducing row validation as an optional feature. Includes clear performance guidance: column validation is fast, row validation has overhead.
Adds detailed documentation for row validation feature including: - Basic usage with Pydantic models - Performance considerations (lightweight column validation vs data-validating row validation) - Error message examples - Combined column and row validation - Return value validation with @df_out - Configuration options in pyproject.toml - Advanced Pydantic features (validators, ConfigDict)
- Remove conditional logic from test_pydantic_types.py since pydantic is always in dev deps - Add pandas-no-pydantic scenario to CI isolation tests - Verifies column validation works without pydantic - Ensures row validation raises appropriate error when pydantic unavailable - Tests are now properly split: regular tests assume all deps, isolation tests verify optional deps
Add optional row-level validation using Pydantic models
Required for row validation tests to run in CI
80d0aec to
ebce903
Compare
- Exclude pydantic_types.py from coverage (optional dependency module) - Add test for unknown DataFrame type error path - Add tests for iterative validation fallback with both Pandas and Polars - Ensures all important code paths are tested, not just hitting coverage numbers
ebce903 to
b39e30c
Compare
- Remove verbose/obvious comments - Simplify docstrings to be concise - Make _convert_nan_to_none a one-liner dict comprehension - Remove unnecessary explanatory comments - More pythonic and easier to read
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds row_validator parameter to df_in and df_out decorators for validating actual data values using Pydantic models (>= 2.4.0).
Column validation remains lightweight. Row validation uses batch processing for optimal performance.