Skip to content

Commit 4a0fe2f

Browse files
authored
make import of pandas or polars optional (#34)
* make import of pandas or polars optional * bump version number and update changelog * fix mypy warnings * What was covered: - Added comprehensive tests for describe_dataframe with dtype information - Added tests for DataFrame logging functions - Added tests for non-DataFrame input handling What was excluded with pragma: - Import error branches for pandas/polars (lines 13-16, 23-26) - The 'no DataFrame library found' error case (line 43-46) * mypy fixes * fixes to isolation tests * more tinkering with isolated tests
1 parent b86ac37 commit 4a0fe2f

File tree

9 files changed

+706
-25
lines changed

9 files changed

+706
-25
lines changed

.github/workflows/main.yml

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ on:
99
- master
1010

1111
jobs:
12+
# Standard tests with both pandas and polars
1213
build:
1314
runs-on: ubuntu-latest
1415
strategy:
@@ -48,3 +49,75 @@ jobs:
4849
fail_ci_if_error: false
4950
token: ${{ secrets.CODECOV_TOKEN }}
5051

52+
# Test optional dependencies scenarios with pytest (always passes)
53+
test-optional-deps-pytest:
54+
runs-on: ubuntu-latest
55+
strategy:
56+
matrix:
57+
python-version: ["3.9", "3.13"]
58+
steps:
59+
- uses: actions/checkout@v3
60+
61+
- name: Set up Python ${{ matrix.python-version }}
62+
uses: actions/setup-python@v4
63+
with:
64+
python-version: ${{ matrix.python-version }}
65+
cache: 'pip'
66+
67+
- name: Install uv
68+
run: |
69+
pip install uv
70+
71+
- name: Run optional dependencies tests with both libraries
72+
run: |
73+
uv sync
74+
uv run pytest tests/test_optional_dependencies.py -v
75+
76+
# Test true isolation scenarios (manual script testing)
77+
test-isolation-scenarios:
78+
runs-on: ubuntu-latest
79+
strategy:
80+
matrix:
81+
python-version: ["3.9", "3.13"]
82+
scenario: ["pandas-only", "polars-only", "both", "none"]
83+
steps:
84+
- uses: actions/checkout@v3
85+
86+
- name: Set up Python ${{ matrix.python-version }}
87+
uses: actions/setup-python@v4
88+
with:
89+
python-version: ${{ matrix.python-version }}
90+
cache: 'pip'
91+
92+
- name: Install uv
93+
run: |
94+
pip install uv
95+
96+
- name: Build daffy wheel
97+
run: |
98+
uv build --wheel
99+
100+
- name: Test pandas-only scenario
101+
if: matrix.scenario == 'pandas-only'
102+
run: |
103+
WHEEL=$(ls dist/daffy-*.whl | head -n1)
104+
uv run --no-project --with "pandas>=1.5.1" --with "$WHEEL" python scripts/test_isolated_deps.py pandas
105+
106+
- name: Test polars-only scenario
107+
if: matrix.scenario == 'polars-only'
108+
run: |
109+
WHEEL=$(ls dist/daffy-*.whl | head -n1)
110+
uv run --no-project --with "polars>=1.7.0" --with "$WHEEL" python scripts/test_isolated_deps.py polars
111+
112+
- name: Test both libraries scenario
113+
if: matrix.scenario == 'both'
114+
run: |
115+
WHEEL=$(ls dist/daffy-*.whl | head -n1)
116+
uv run --no-project --with "pandas>=1.5.1" --with "polars>=1.7.0" --with "$WHEEL" python scripts/test_isolated_deps.py both
117+
118+
- name: Test no libraries scenario (expected to fail gracefully)
119+
if: matrix.scenario == 'none'
120+
run: |
121+
WHEEL=$(ls dist/daffy-*.whl | head -n1)
122+
uv run --no-project --with "$WHEEL" python scripts/test_isolated_deps.py none
123+

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,18 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## 0.16.0
6+
7+
- Removed Pandas and Polars from required dependencies. Daffy will not pull in Polars if your project just uses Pandas
8+
and vice versa. All combinations are dynamically supported and require no changes from existing users.
9+
10+
### Testing & CI
11+
12+
- Added comprehensive CI testing for all dependency combinations
13+
- New test suite validates optional dependency behavior
14+
- Manual testing script for developers (`scripts/test_isolated_deps.py`)
15+
- Updated CI to test pandas-only, polars-only, both, and none scenarios
16+
517
## 0.15.0
618

719
- Exception messages now include function names to improve debugging

TESTING_OPTIONAL_DEPS.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Testing Optional Dependencies
2+
3+
This document describes how to test daffy's optional dependency support for pandas and polars.
4+
5+
## Background
6+
7+
Daffy now supports optional dependencies - you can install it with just pandas, just polars, or both. This testing setup ensures that all combinations work correctly.
8+
9+
## Automated Testing
10+
11+
### CI Pipeline
12+
13+
The GitHub Actions workflow includes three separate jobs:
14+
15+
1. **Standard tests** - Run with both pandas and polars installed (full functionality)
16+
2. **Pytest optional dependency tests** - Run pytest tests that work with available libraries (always pass locally and in CI)
17+
3. **Isolation scenario tests** - Test each scenario in true isolation using built wheels:
18+
- `pandas-only` - Only pandas is available
19+
- `polars-only` - Only polars is available
20+
- `both` - Both libraries available
21+
- `none` - No DataFrame libraries (should fail gracefully)
22+
23+
### Simple Pytest Tests
24+
25+
The file `tests/test_optional_dependencies.py` contains tests that:
26+
- Verify library detection flags work correctly
27+
- Test that error messages reflect available libraries
28+
- Ensure decorators work with whatever is installed
29+
30+
These tests are designed to always pass regardless of which DataFrame libraries are installed. They run as part of the regular test suite and should succeed when you run `uv run pytest` locally.
31+
32+
## Manual Testing
33+
34+
### Using the Test Script
35+
36+
The `scripts/test_isolated_deps.py` script allows manual testing of different scenarios:
37+
38+
**Note:** The pandas-only and polars-only tests will likely "fail" in local development environments because both libraries are typically installed. These tests are designed to work in CI environments with truly isolated environments using built wheel packages. The test failure messages will explain this.
39+
40+
```bash
41+
# First build a wheel to avoid dev dependencies
42+
uv build --wheel
43+
44+
# Test with pandas only
45+
WHEEL=$(ls dist/daffy-*.whl | head -n1)
46+
uv run --no-project --with "pandas>=1.5.1" --with "$WHEEL" python scripts/test_isolated_deps.py pandas
47+
48+
# Test with polars only
49+
WHEEL=$(ls dist/daffy-*.whl | head -n1)
50+
uv run --no-project --with "polars>=1.7.0" --with "$WHEEL" python scripts/test_isolated_deps.py polars
51+
52+
# Test with both
53+
WHEEL=$(ls dist/daffy-*.whl | head -n1)
54+
uv run --no-project --with "pandas>=1.5.1" --with "polars>=1.7.0" --with "$WHEEL" python scripts/test_isolated_deps.py both
55+
56+
# Test with neither (should fail gracefully)
57+
WHEEL=$(ls dist/daffy-*.whl | head -n1)
58+
uv run --no-project --with "$WHEEL" python scripts/test_isolated_deps.py none
59+
```
60+
61+
### Expected Behaviors
62+
63+
#### Pandas Only
64+
- `HAS_PANDAS = True`, `HAS_POLARS = False`
65+
- Only pandas DataFrames are accepted
66+
- Error messages mention "Pandas DataFrame"
67+
68+
#### Polars Only
69+
- `HAS_PANDAS = False`, `HAS_POLARS = True`
70+
- Only polars DataFrames are accepted
71+
- Error messages mention "Polars DataFrame"
72+
73+
#### Both Libraries
74+
- `HAS_PANDAS = True`, `HAS_POLARS = True`
75+
- Both DataFrame types work
76+
- Error messages mention "Pandas or Polars DataFrame"
77+
78+
#### No Libraries
79+
- Import should fail with: `ImportError: No DataFrame library found. Please install Pandas or Polars`
80+
81+
## Implementation Details
82+
83+
The optional dependency support works through:
84+
85+
1. **Lazy imports** in `daffy/utils.py` with try/except blocks
86+
2. **Runtime type checking** that builds DataFrame type tuples dynamically
87+
3. **Conditional type hints** using `TYPE_CHECKING` for static analysis
88+
4. **Dynamic error messages** that reflect available libraries
89+
90+
## Adding New Tests
91+
92+
When adding tests for optional dependencies:
93+
94+
1. Use the simple approach in `test_optional_dependencies.py`
95+
2. Check `HAS_PANDAS` and `HAS_POLARS` flags to conditionally run tests
96+
3. Use `pytest.mark.skipif` for tests requiring specific libraries
97+
4. Test error message content to ensure it reflects available libraries
98+
99+
## Development Workflow
100+
101+
When working on optional dependency features:
102+
103+
1. Run standard tests: `uv run pytest`
104+
2. Test specific scenarios: `uv run python scripts/test_isolated_deps.py <scenario>`
105+
3. Verify CI passes with all dependency combinations
106+
4. Ensure mypy type checking works: `uv run mypy daffy`

daffy/decorators.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,20 @@
22

33
import logging
44
from functools import wraps
5-
from typing import Any, Callable, Optional, TypeVar, Union
5+
from typing import TYPE_CHECKING, Any, Callable, Optional, TypeVar, Union
66

7-
# Import fully qualified types to satisfy disallow_any_unimported
8-
from pandas import DataFrame as PandasDataFrame
9-
from polars import DataFrame as PolarsDataFrame
7+
if TYPE_CHECKING:
8+
# For static type checking, assume both are available
9+
from pandas import DataFrame as PandasDataFrame
10+
from polars import DataFrame as PolarsDataFrame
11+
else:
12+
# For runtime, these will be imported from utils if available
13+
PandasDataFrame = None
14+
PolarsDataFrame = None
1015

1116
from daffy.config import get_strict
1217
from daffy.utils import (
18+
DataFrameType,
1319
assert_is_dataframe,
1420
get_parameter,
1521
get_parameter_name,
@@ -20,7 +26,10 @@
2026

2127
# Type variables for preserving return types
2228
T = TypeVar("T") # Generic type var for df_log
23-
DF = TypeVar("DF", bound=Union[PandasDataFrame, PolarsDataFrame])
29+
if TYPE_CHECKING:
30+
DF = TypeVar("DF", bound=Union[PandasDataFrame, PolarsDataFrame])
31+
else:
32+
DF = TypeVar("DF", bound=DataFrameType)
2433
R = TypeVar("R") # Return type for df_in
2534

2635

daffy/utils.py

Lines changed: 76 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,68 @@
22

33
import inspect
44
import logging
5-
from typing import Any, Callable, Optional, Union
6-
7-
import pandas as pd
8-
import polars as pl
9-
10-
# Import fully qualified types to satisfy disallow_any_unimported
11-
from pandas import DataFrame as PandasDataFrame
12-
from polars import DataFrame as PolarsDataFrame
5+
from typing import TYPE_CHECKING, Any, Callable, Optional, Union
6+
7+
# Lazy imports - only import what's available
8+
try:
9+
import pandas as pd
10+
from pandas import DataFrame as PandasDataFrame
11+
12+
HAS_PANDAS = True
13+
except ImportError: # pragma: no cover
14+
pd = None # type: ignore
15+
PandasDataFrame = None # type: ignore
16+
HAS_PANDAS = False
17+
18+
try:
19+
import polars as pl
20+
from polars import DataFrame as PolarsDataFrame
21+
22+
HAS_POLARS = True
23+
except ImportError: # pragma: no cover
24+
pl = None # type: ignore
25+
PolarsDataFrame = None # type: ignore
26+
HAS_POLARS = False
27+
28+
# Build DataFrame type dynamically based on what's available
29+
if TYPE_CHECKING:
30+
# For static type checking, assume both are available
31+
from pandas import DataFrame as PandasDataFrame
32+
from polars import DataFrame as PolarsDataFrame
33+
34+
DataFrameType = Union[PandasDataFrame, PolarsDataFrame]
35+
else:
36+
# For runtime, build type tuple from available libraries
37+
_available_types = []
38+
if HAS_PANDAS:
39+
_available_types.append(PandasDataFrame)
40+
if HAS_POLARS:
41+
_available_types.append(PolarsDataFrame)
42+
43+
if not _available_types: # pragma: no cover
44+
raise ImportError(
45+
"No DataFrame library found. Please install Pandas or Polars: pip install pandas OR pip install polars"
46+
)
1347

14-
DataFrameType = Union[PandasDataFrame, PolarsDataFrame]
48+
DataFrameType = Union[tuple(_available_types)]
1549

1650

1751
def assert_is_dataframe(obj: Any, context: str) -> None:
18-
if not isinstance(obj, (pd.DataFrame, pl.DataFrame)):
19-
raise AssertionError(f"Wrong {context}. Expected DataFrame, got {type(obj).__name__} instead.")
52+
# Build type tuple dynamically based on available libraries
53+
dataframe_types: list[Any] = []
54+
if HAS_PANDAS and pd is not None:
55+
dataframe_types.append(pd.DataFrame)
56+
if HAS_POLARS and pl is not None:
57+
dataframe_types.append(pl.DataFrame)
58+
59+
if not isinstance(obj, tuple(dataframe_types)):
60+
available_libs = []
61+
if HAS_PANDAS:
62+
available_libs.append("Pandas")
63+
if HAS_POLARS:
64+
available_libs.append("Polars")
65+
libs_str = " or ".join(available_libs)
66+
raise AssertionError(f"Wrong {context}. Expected {libs_str} DataFrame, got {type(obj).__name__} instead.")
2067

2168

2269
def format_param_context(
@@ -64,21 +111,35 @@ def get_parameter_name(
64111
def describe_dataframe(df: DataFrameType, include_dtypes: bool = False) -> str:
65112
result = f"columns: {list(df.columns)}"
66113
if include_dtypes:
67-
if isinstance(df, pd.DataFrame):
114+
if HAS_PANDAS and pd is not None and isinstance(df, pd.DataFrame):
68115
readable_dtypes = [dtype.name for dtype in df.dtypes]
69116
result += f" with dtypes {readable_dtypes}"
70-
else:
117+
elif HAS_POLARS and pl is not None and isinstance(df, pl.DataFrame):
71118
result += f" with dtypes {df.dtypes}"
72119
return result
73120

74121

75122
def log_dataframe_input(level: int, func_name: str, df: Any, include_dtypes: bool) -> None:
76-
if isinstance(df, (pd.DataFrame, pl.DataFrame)):
123+
# Build type tuple dynamically based on available libraries
124+
dataframe_types: list[Any] = []
125+
if HAS_PANDAS and pd is not None:
126+
dataframe_types.append(pd.DataFrame)
127+
if HAS_POLARS and pl is not None:
128+
dataframe_types.append(pl.DataFrame)
129+
130+
if isinstance(df, tuple(dataframe_types)):
77131
logging.log(
78132
level, f"Function {func_name} parameters contained a DataFrame: {describe_dataframe(df, include_dtypes)}"
79133
)
80134

81135

82136
def log_dataframe_output(level: int, func_name: str, df: Any, include_dtypes: bool) -> None:
83-
if isinstance(df, (pd.DataFrame, pl.DataFrame)):
137+
# Build type tuple dynamically based on available libraries
138+
dataframe_types: list[Any] = []
139+
if HAS_PANDAS and pd is not None:
140+
dataframe_types.append(pd.DataFrame)
141+
if HAS_POLARS and pl is not None:
142+
dataframe_types.append(pl.DataFrame)
143+
144+
if isinstance(df, tuple(dataframe_types)):
84145
logging.log(level, f"Function {func_name} returned a DataFrame: {describe_dataframe(df, include_dtypes)}")

0 commit comments

Comments
 (0)