|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +Daffy is a DataFrame column validator library that provides runtime validation decorators (`@df_in`, `@df_out`, `@df_log`) for Pandas and Polars DataFrames. It validates column names (including regex patterns), data types, and enforces strictness rules through simple function decorators. |
| 8 | + |
| 9 | +## Workflow |
| 10 | + |
| 11 | +When working on this repository, follow this structured approach: |
| 12 | + |
| 13 | +### 1. Plan First |
| 14 | +- Understand the full scope of the task before starting |
| 15 | +- Identify which modules will be affected |
| 16 | +- Consider edge cases and testing requirements |
| 17 | +- Break the work into logical, incremental steps |
| 18 | + |
| 19 | +### 2. Small Working Commits |
| 20 | +- Each commit should be a complete, working unit of functionality |
| 21 | +- Break large features into multiple small commits |
| 22 | +- Each commit should pass all tests and linting |
| 23 | +- Commit messages should be clear and descriptive |
| 24 | + |
| 25 | +### 3. Test-Driven Development Cycle |
| 26 | + |
| 27 | +For each commit, follow this order: |
| 28 | + |
| 29 | +```bash |
| 30 | +# 1. Write or update tests first |
| 31 | +# Add tests to appropriate tests/test_*.py file |
| 32 | + |
| 33 | +# 2. Run tests to see them fail |
| 34 | +uv run pytest tests/test_your_feature.py |
| 35 | + |
| 36 | +# 3. Implement the feature |
| 37 | +# Make changes to daffy/*.py files |
| 38 | + |
| 39 | +# 4. Run tests to see them pass |
| 40 | +uv run pytest tests/test_your_feature.py |
| 41 | + |
| 42 | +# 5. Run full test suite |
| 43 | +uv run pytest |
| 44 | + |
| 45 | +# 6. Run linting and formatting |
| 46 | +uv run ruff format |
| 47 | +uv run ruff check --fix |
| 48 | +uv run pyrefly check . |
| 49 | + |
| 50 | +# 7. Commit the changes |
| 51 | +git add . |
| 52 | +git commit -m "Descriptive commit message" |
| 53 | +``` |
| 54 | + |
| 55 | +### 4. Before Creating a PR |
| 56 | + |
| 57 | +Check if documentation needs updating: |
| 58 | +- **README.md** - If public API or examples changed |
| 59 | +- **docs/usage.md** - If usage patterns or features changed |
| 60 | +- **docs/development.md** - If development workflow changed |
| 61 | +- **CHANGELOG.md** - Always update with changes (see existing format) |
| 62 | +- **Type hints** - Ensure all new functions have proper annotations |
| 63 | + |
| 64 | +### 5. Important: Commit and PR Messages |
| 65 | +- **NEVER mention AI tools or assistants** in commit messages, PR descriptions, or code comments |
| 66 | +- Write commit messages as if you wrote the code yourself |
| 67 | +- Use conventional commit format when appropriate (e.g., "fix:", "feat:", "docs:") |
| 68 | +- Focus on what changed and why, not how it was developed |
| 69 | + |
| 70 | +## Development Commands |
| 71 | + |
| 72 | +### Setup |
| 73 | +```bash |
| 74 | +# Install dependencies using uv (preferred) |
| 75 | +uv sync --group test --group dev |
| 76 | + |
| 77 | +# Alternative: install only test dependencies |
| 78 | +uv sync --group test |
| 79 | +``` |
| 80 | + |
| 81 | +### Testing |
| 82 | +```bash |
| 83 | +# Run all tests |
| 84 | +uv run pytest |
| 85 | + |
| 86 | +# Run tests with coverage |
| 87 | +uv run pytest --cov --cov-report=html |
| 88 | + |
| 89 | +# Run specific test file |
| 90 | +uv run pytest tests/test_df_in.py |
| 91 | + |
| 92 | +# Run tests matching pattern |
| 93 | +uv run pytest -k "test_missing_columns" |
| 94 | + |
| 95 | +# Run with verbose output |
| 96 | +uv run pytest -v |
| 97 | +``` |
| 98 | + |
| 99 | +### Linting and Type Checking |
| 100 | +```bash |
| 101 | +# Run Ruff formatter |
| 102 | +uv run ruff format |
| 103 | + |
| 104 | +# Run Ruff linter |
| 105 | +uv run ruff check |
| 106 | + |
| 107 | +# Run Ruff linter with auto-fix |
| 108 | +uv run ruff check --fix |
| 109 | + |
| 110 | +# Run type checker (Pyrefly) |
| 111 | +uv run pyrefly check . |
| 112 | +``` |
| 113 | + |
| 114 | +### Pre-commit Hooks |
| 115 | +```bash |
| 116 | +# Install pre-commit hooks (runs ruff format + ruff check on each commit) |
| 117 | +pre-commit install |
| 118 | +``` |
| 119 | + |
| 120 | +### Building |
| 121 | +```bash |
| 122 | +# Build wheel package |
| 123 | +uv build --wheel |
| 124 | + |
| 125 | +# Build both wheel and sdist |
| 126 | +uv build |
| 127 | +``` |
| 128 | + |
| 129 | +### Testing Optional Dependencies |
| 130 | + |
| 131 | +Daffy supports optional dependencies (pandas-only, polars-only, or both). See `TESTING_OPTIONAL_DEPS.md` for details. |
| 132 | + |
| 133 | +```bash |
| 134 | +# Build wheel first |
| 135 | +uv build --wheel |
| 136 | + |
| 137 | +# Test with pandas only |
| 138 | +WHEEL=$(ls dist/daffy-*.whl | head -n1) |
| 139 | +uv run --no-project --with "pandas>=1.5.1" --with "$WHEEL" python scripts/test_isolated_deps.py pandas |
| 140 | + |
| 141 | +# Test with polars only |
| 142 | +uv run --no-project --with "polars>=1.7.0" --with "$WHEEL" python scripts/test_isolated_deps.py polars |
| 143 | + |
| 144 | +# Test with both libraries |
| 145 | +uv run --no-project --with "pandas>=1.5.1" --with "polars>=1.7.0" --with "$WHEEL" python scripts/test_isolated_deps.py both |
| 146 | +``` |
| 147 | + |
| 148 | +## Architecture |
| 149 | + |
| 150 | +### Core Module Responsibilities |
| 151 | + |
| 152 | +**decorators.py** - Public API and orchestration |
| 153 | +- Exports `df_in`, `df_out`, `df_log` decorators |
| 154 | +- Orchestrates validation by calling validation.py and utils.py |
| 155 | +- Manages configuration precedence (decorator param > config file > default) |
| 156 | +- Preserves type information using TypeVar for static type checking |
| 157 | + |
| 158 | +**validation.py** - Core validation logic |
| 159 | +- `validate_dataframe()` is the central validation engine |
| 160 | +- Supports two modes: list-based (columns only) or dict-based (columns + dtypes) |
| 161 | +- Handles regex pattern matching via patterns.py |
| 162 | +- Accumulates all validation errors before raising single AssertionError |
| 163 | +- Performs strictness checking (no extra columns when strict=True) |
| 164 | + |
| 165 | +**patterns.py** - Regex pattern handling |
| 166 | +- Recognizes `r/pattern/` syntax for regex column matching |
| 167 | +- Compiles regex patterns and caches them as `RegexColumnDef` tuples |
| 168 | +- Provides matching functions used by validation layer |
| 169 | +- Example: `"r/Price_[0-9]+/"` matches Price_1, Price_2, etc. |
| 170 | + |
| 171 | +**utils.py** - Cross-cutting utilities |
| 172 | +- DataFrame type assertions using `assert_is_dataframe()` |
| 173 | +- Parameter extraction from function signatures via `get_parameter()` |
| 174 | +- Context formatting for error messages |
| 175 | +- DataFrame description for logging |
| 176 | +- Logging functions for df_log decorator |
| 177 | + |
| 178 | +**config.py** - Configuration management |
| 179 | +- Loads `[tool.daffy]` section from pyproject.toml |
| 180 | +- Caches configuration on first access |
| 181 | +- Only searches in current working directory (not parent dirs) |
| 182 | +- Configuration precedence: decorator parameter > config file > False (default) |
| 183 | + |
| 184 | +**dataframe_types.py** - Optional dependency handling |
| 185 | +- Dynamically constructs DataFrame type unions based on installed libraries |
| 186 | +- Supports pandas-only, polars-only, both, or neither scenarios |
| 187 | +- Separate compile-time (TYPE_CHECKING) and runtime type definitions |
| 188 | +- Provides `get_dataframe_types()` for isinstance() checks |
| 189 | +- **IMPORTANT**: This file is excluded from coverage (see pyproject.toml:88) because it's tested via isolation scenarios in CI |
| 190 | + |
| 191 | +### Data Flow |
| 192 | + |
| 193 | +``` |
| 194 | +User calls decorated function |
| 195 | + ↓ |
| 196 | +@df_in wrapper executes |
| 197 | + ↓ |
| 198 | +get_parameter() extracts DataFrame from args/kwargs |
| 199 | + ↓ |
| 200 | +assert_is_dataframe() validates type |
| 201 | + ↓ |
| 202 | +get_strict() reads config (cached) |
| 203 | + ↓ |
| 204 | +validate_dataframe() checks columns/dtypes/strictness |
| 205 | + ↓ |
| 206 | +Original function executes |
| 207 | + ↓ |
| 208 | +@df_out wrapper validates return value |
| 209 | + ↓ |
| 210 | +Result returned to caller |
| 211 | +``` |
| 212 | + |
| 213 | +### Key Design Patterns |
| 214 | + |
| 215 | +1. **Optional Dependency Injection**: dataframe_types.py dynamically builds type unions based on available libraries (pandas/polars) |
| 216 | + |
| 217 | +2. **Lazy Configuration Loading**: Config file read once and cached; expensive operations happen only on first access |
| 218 | + |
| 219 | +3. **Error Context Accumulation**: Validation collects ALL errors before raising, providing complete feedback in single exception |
| 220 | + |
| 221 | +4. **Type-Safe Decorator Composition**: Uses TypeVar to preserve return types through decorator stack for static type checkers |
| 222 | + |
| 223 | +5. **Regex Pattern Abstraction**: Patterns are compiled once and reused; validation layer doesn't handle regex directly |
| 224 | + |
| 225 | +## Configuration |
| 226 | + |
| 227 | +Users can set project-wide defaults in `pyproject.toml`: |
| 228 | + |
| 229 | +```toml |
| 230 | +[tool.daffy] |
| 231 | +strict = false # or true to disallow extra columns by default |
| 232 | +``` |
| 233 | + |
| 234 | +Decorator parameters override config file settings: |
| 235 | +```python |
| 236 | +@df_in(columns=["A", "B"], strict=True) # strict=True overrides config |
| 237 | +``` |
| 238 | + |
| 239 | +## Testing Strategy |
| 240 | + |
| 241 | +**Unit Tests** (tests/test_*.py): |
| 242 | +- test_df_in.py - Input validation decorator |
| 243 | +- test_df_out.py - Output validation decorator |
| 244 | +- test_df_log.py - Logging decorator |
| 245 | +- test_decorators.py - Decorator composition |
| 246 | +- test_config.py - Configuration loading |
| 247 | +- test_optional_dependencies.py - Library detection (always passes) |
| 248 | +- test_type_compatibility.py - Type hint compatibility |
| 249 | + |
| 250 | +**Isolation Tests** (CI only via scripts/test_isolated_deps.py): |
| 251 | +- Test pandas-only, polars-only, both, and none scenarios in true isolation |
| 252 | +- Uses built wheel packages to avoid dev environment contamination |
| 253 | +- These tests may "fail" locally since both libraries are typically installed in dev |
| 254 | + |
| 255 | +**Coverage Requirements**: |
| 256 | +- Minimum 95% coverage (pyproject.toml:92) |
| 257 | +- dataframe_types.py excluded (tested in isolation scenarios) |
| 258 | + |
| 259 | +## Common Patterns |
| 260 | + |
| 261 | +### Adding New Validation Logic |
| 262 | + |
| 263 | +1. Add core validation logic to validation.py |
| 264 | +2. Integrate into `validate_dataframe()` function |
| 265 | +3. Add error message formatting in utils.py if needed |
| 266 | +4. Update decorators.py to pass new parameters |
| 267 | +5. Add tests in appropriate test_*.py file |
| 268 | + |
| 269 | +### Supporting New DataFrame Types |
| 270 | + |
| 271 | +1. Update dataframe_types.py to import new library conditionally |
| 272 | +2. Add to _available_types list if library is available |
| 273 | +3. Update get_dataframe_types() and get_available_library_names() |
| 274 | +4. Add tests for new library in test_optional_dependencies.py |
| 275 | +5. Add isolation scenario test in scripts/test_isolated_deps.py |
| 276 | + |
| 277 | +### Modifying Configuration Options |
| 278 | + |
| 279 | +1. Update config.py load_config() to parse new option |
| 280 | +2. Add accessor function (like get_strict()) |
| 281 | +3. Update decorators to use new config option |
| 282 | +4. Add tests in test_config.py |
| 283 | +5. Document in README.md and docs/usage.md |
| 284 | + |
| 285 | +## Important Constraints |
| 286 | + |
| 287 | +- **Python 3.9+ compatibility**: Code must work on Python 3.9-3.14 |
| 288 | +- **Type hints required**: All functions should have proper type annotations (Ruff ANN rules) |
| 289 | +- **No hard dependencies**: pandas and polars are optional; only tomli is required |
| 290 | +- **Coverage threshold**: 95% minimum (excluding dataframe_types.py) |
| 291 | +- **Import organization**: Use TYPE_CHECKING for static vs runtime type imports |
| 292 | + |
| 293 | +## Version Management |
| 294 | + |
| 295 | +- Version number is in pyproject.toml:3 |
| 296 | +- Update CHANGELOG.md when making changes |
| 297 | +- Follow existing changelog format (see CHANGELOG.md for examples) |
| 298 | +- avoid comments that are obvious. aim to improve function or variable names to avoid comments |
0 commit comments