Skip to content

Add Polars pydantic integration with format support and native JSON schema generation #1979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 20, 2025

Conversation

halicki
Copy link
Contributor

@halicki halicki commented Apr 23, 2025

  • Add pydantic validation for Polars DataFrames and LazyFrames
  • Implement DataFrame type conversion from various formats (dict, CSV, JSON, Parquet, Feather)
  • Replace pandas dependency with native Polars JSON schema generation
  • Support both Pydantic v1 and v2 with appropriate validators
  • Add comprehensive test suite for the integration

…chema generation

- Add pydantic validation for Polars DataFrames and LazyFrames
- Implement DataFrame type conversion from various formats (dict, CSV, JSON, Parquet, Feather)
- Replace pandas dependency with native Polars JSON schema generation
- Support both Pydantic v1 and v2 with appropriate validators
- Add comprehensive test suite for the integration

Signed-off-by: Arkadiusz Halicki <[email protected]>
@halicki halicki force-pushed the polars-pydantic-integration branch from 1ccf9cc to 52fbe10 Compare April 24, 2025 11:00
Copy link

codecov bot commented Apr 24, 2025

Codecov Report

Attention: Patch coverage is 96.80000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.84%. Comparing base (812b2a8) to head (477d6b4).
Report is 238 commits behind head on main.

Files with missing lines Patch % Lines
pandera/api/polars/model.py 76.47% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1979      +/-   ##
==========================================
- Coverage   94.28%   93.84%   -0.45%     
==========================================
  Files          91      121      +30     
  Lines        7013     9742    +2729     
==========================================
+ Hits         6612     9142    +2530     
- Misses        401      600     +199     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cosmicBboy
Copy link
Collaborator

cosmicBboy commented Apr 24, 2025

hi @halicki ! looks like there are some failing linting issues. Feel free to disable the pylint issues inline.

Signed-off-by: Arkadiusz Halicki <[email protected]>
@halicki halicki force-pushed the polars-pydantic-integration branch from 8f0755d to 5ef1cb4 Compare April 24, 2025 14:56
Signed-off-by: cosmicBboy <[email protected]>
@cosmicBboy
Copy link
Collaborator

hey @halicki it looks like there are several areas in polars/typing.py in the proposed changes that are not covered by tests, would you mind adding unit tests for these?

@halicki
Copy link
Contributor Author

halicki commented Apr 25, 2025

Hey, @cosmicBboy. Sure, but I'll probably take it next week.

@cosmicBboy
Copy link
Collaborator

@halicki looks like all the unit tests are passing, now just need to make the linters happy

halicki added 5 commits May 19, 2025 10:39
- Add new test file test_polars_typing.py with complete coverage for polars typing module
- Test DataFrame.from_format and to_format with various data formats
- Cover both success and error paths
- Add tests for Pydantic integration
- Add pragmas to conditionally exclude import-time and version-specific code from coverage

Achieves 100% test coverage for the module.

Signed-off-by: Arkadiusz Halicki <[email protected]>
Tests now provide sufficient coverage without the need for pragma directives.

Signed-off-by: Arkadiusz Halicki <[email protected]>
- Created comprehensive test suite for typing/polars.py
- Added tests for DataFrame, LazyFrame, and Series classes
- Added tests for format conversion methods
- Added tests for Pydantic integration (v1 and v2)
- Added pragma no cover to hard-to-test code paths
- Achieved 100% test coverage

Signed-off-by: Arkadiusz Halicki <[email protected]>
Signed-off-by: Arkadiusz Halicki <[email protected]>
Signed-off-by: Arkadiusz Halicki <[email protected]>
@halicki halicki force-pushed the polars-pydantic-integration branch from 5ce1d7c to a456b69 Compare May 19, 2025 08:40
halicki added 3 commits May 19, 2025 13:00
This commit adds test cases that demonstrate how to use optional columns
with Polars DataFrameModels when integrating with Pydantic. The tests show:

- Using Optional[Series[type]] annotation to make a column optional
- Validating DataFrames with and without the optional column
- Ensuring type validation still works on optional columns when present
- Verifying that required columns still must be present

These tests help document the supported patterns for optional columns
in Pandera's Polars integration.

Signed-off-by: Arkadiusz Halicki <[email protected]>
Signed-off-by: Arkadiusz Halicki <[email protected]>
Signed-off-by: Arkadiusz Halicki <[email protected]>
@halicki
Copy link
Contributor Author

halicki commented May 19, 2025

My part works. Problems are in pyspark and doc builds.

@cosmicBboy
Copy link
Collaborator

@halicki thanks, will merge this and address pyspark issues separately.

@cosmicBboy cosmicBboy merged commit 3670cf8 into unionai-oss:main May 20, 2025
185 of 192 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants