Skip to content

Conversation

@vertti
Copy link
Collaborator

@vertti vertti commented Oct 26, 2025

Summary

Refactors DataFrame type handling into a dedicated dataframe_types.py module, eliminating code duplication and improving separation of concerns.

Changes

New module: daffy/dataframe_types.py

  • Centralizes all pandas/polars type handling
  • Provides get_dataframe_types() helper for isinstance checks
  • Provides get_available_library_names() for error messages
  • Exports DataFrameType, HAS_PANDAS, HAS_POLARS flags

Updated modules:

  • validation.py: Import DataFrameType from new module
  • decorators.py: Import DataFrameType from new module
  • utils.py: Remove duplicate type handling code, use centralized helpers

Result:

  • Eliminated 3 instances of duplicated "build type tuple dynamically" code
  • Reduced utils.py from 146 to 87 lines
  • Clear separation: dataframe_types.py handles pandas/polars, utils.py handles general utilities

Motivation

Prepares codebase for future lazy dataframe support while improving maintainability:

  • Type handling logic is now isolated and easier to test
  • Better foundation for adding new dataframe types
  • Cleaner code organization with single responsibility principle

Version

Bumped to 0.16.1 (patch) - purely internal refactoring with no functional changes.

This new module centralizes all pandas/polars DataFrame type handling:
- Lazy imports with HAS_PANDAS and HAS_POLARS flags
- DataFrameType union for type hints
- get_dataframe_types() helper for isinstance checks
- get_available_library_names() helper for error messages

This lays the foundation for eliminating code duplication in utils.py
and provides better structure for future lazy dataframe support.
This commit completes the refactoring by updating utils.py to use the
centralized helpers from dataframe_types module:

- Removed duplicate pandas/polars import blocks
- Removed duplicate DataFrameType definition
- Replaced 3 instances of "build type tuple dynamically" with
  get_dataframe_types() calls in:
  * assert_is_dataframe()
  * log_dataframe_input()
  * log_dataframe_output()
- Use get_available_library_names() for error messages

This reduces utils.py from 146 to 87 lines while improving
maintainability and testability.
This module handles optional dependencies (pandas/polars) and cannot be
meaningfully tested in normal unit tests where both libraries are present.
It is thoroughly tested in the isolation test suite instead.
@vertti vertti merged commit aa61643 into ThoughtWorksInc:master Oct 26, 2025
18 checks passed
@vertti vertti deleted the refactor-dataframe-types branch October 26, 2025 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant