Skip to content

Conversation

@vertti
Copy link
Collaborator

@vertti vertti commented Sep 21, 2025

This PR implements fully dynamic DataFrame library support, allowing users to install daffy with just pandas, just polars, or both, without forcing both libraries as hard dependencies.

Problem

Previously, daffy required both pandas and polars as hard dependencies, meaning users had to install both libraries even if they only used one. This increased package size and complexity for users who only needed one DataFrame library.

Solution

Core Changes

1. Lazy imports with graceful fallbacks (daffy/utils.py)

# Before: Hard imports
import pandas as pd
import polars as pl

# After: Lazy imports with detection
try:
    import pandas as pd
    HAS_PANDAS = True
except ImportError:
    pd = None
    HAS_PANDAS = False

2. Dynamic type checking

  • Runtime DataFrame validation works with whatever is available
  • Error messages reflect only installed libraries
  • Type hints work correctly for static analysis

3. Removed hard dependencies (pyproject.toml)

dependencies = [
-    "pandas>=1.5.1,<3.0.0",
-    "polars>=1.7.0",
     "tomli>=2.0.0",
]

4. Comprehensive CI testing (.github/workflows/main.yml)

  • Tests all dependency combinations: pandas-only, polars-only, both, none
  • Uses uv run --with for isolated dependency testing
  • Runs on Python 3.9 and 3.13 for broad compatibility

5. Automated test suite (tests/test_optional_dependencies.py)

  • Validates library detection logic
  • Tests error message accuracy
  • Ensures decorators work with available libraries

6. Manual testing script (scripts/test_isolated_deps.py)

  • Allows developers to test specific dependency scenarios
  • Used by CI for isolated environment testing
  • Provides detailed diagnostic output

Benefits

  • Zero dependencies: Users install only what they need (pip install daffy pandas)
  • Smaller package size: No bundled DataFrame libraries
  • Backward compatible: All existing code continues to work
  • Future-proof: Easy to add support for other DataFrame libraries
  • Clear error messages: Helpful guidance when libraries are missing

  - Added comprehensive tests for describe_dataframe with dtype information
  - Added tests for DataFrame logging functions
  - Added tests for non-DataFrame input handling

  What was excluded with pragma:
  - Import error branches for pandas/polars (lines 13-16, 23-26)
  - The 'no DataFrame library found' error case (line 43-46)
@vertti vertti merged commit 4a0fe2f into ThoughtWorksInc:master Sep 22, 2025
15 checks passed
@vertti vertti deleted the optional-deps branch September 22, 2025 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant