|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +Spotify Confidence is a Python library for A/B test analysis. It provides convenience wrappers around statsmodel's functions for computing p-values and confidence intervals. The library supports both frequentist (Z-test, Student's T-test, Chi-squared) and Bayesian (BetaBinomial) statistical methods, with features for variance reduction, sequential testing, and sample size calculations. |
| 8 | + |
| 9 | +## Development Commands |
| 10 | + |
| 11 | +### Setup |
| 12 | +```bash |
| 13 | +# Install with development dependencies (including tox-uv) |
| 14 | +uv pip install -e ".[dev]" |
| 15 | +``` |
| 16 | + |
| 17 | +### Testing |
| 18 | +```bash |
| 19 | +# Run all tests with coverage |
| 20 | +uv run pytest |
| 21 | + |
| 22 | +# Run tests without coverage reports |
| 23 | +uv run pytest --no-cov |
| 24 | + |
| 25 | +# Run specific test file |
| 26 | +uv run pytest tests/frequentist/test_z_test.py |
| 27 | + |
| 28 | +# Run specific test |
| 29 | +uv run pytest tests/frequentist/test_z_test.py::test_name |
| 30 | + |
| 31 | +# Run all tests across Python versions |
| 32 | +uv run tox |
| 33 | +``` |
| 34 | + |
| 35 | +### Code Quality |
| 36 | +```bash |
| 37 | +# Format code with black (line length: 119) |
| 38 | +uv run black spotify_confidence tests |
| 39 | + |
| 40 | +# Check formatting without making changes |
| 41 | +uv run black --check --diff spotify_confidence tests |
| 42 | + |
| 43 | +# Lint with flake8 (max line length: 120) |
| 44 | +uv run flake8 spotify_confidence tests |
| 45 | + |
| 46 | +# Run all quality checks (as done in CI) |
| 47 | +uv run black --check --diff spotify_confidence tests && uv run flake8 spotify_confidence tests && uv run pytest |
| 48 | +``` |
| 49 | + |
| 50 | +### Build |
| 51 | +```bash |
| 52 | +# Build distribution packages |
| 53 | +uv run python -m build |
| 54 | +``` |
| 55 | + |
| 56 | +## Architecture |
| 57 | + |
| 58 | +### Core Design Pattern |
| 59 | + |
| 60 | +The library follows an object-oriented design with separation of concerns: |
| 61 | + |
| 62 | +1. **Statistical Test Classes**: High-level APIs (`ZTest`, `StudentsTTest`, `ChiSquared`, `BetaBinomial`, `ZTestLinreg`) |
| 63 | +2. **Experiment Class**: Base class containing shared analysis methods for frequentist tests |
| 64 | +3. **Computer Classes**: Perform the actual statistical computations |
| 65 | +4. **Grapher Classes**: Generate visualizations using Chartify |
| 66 | + |
| 67 | +All main test classes inherit from abstract base classes in `spotify_confidence/analysis/abstract_base_classes/`: |
| 68 | +- `ConfidenceABC`: Base for all statistical test classes |
| 69 | +- `ConfidenceComputerABC`: Base for computation logic |
| 70 | +- `ConfidenceGrapherABC`: Base for visualization logic |
| 71 | + |
| 72 | +### Module Structure |
| 73 | + |
| 74 | +``` |
| 75 | +spotify_confidence/ |
| 76 | +├── analysis/ |
| 77 | +│ ├── abstract_base_classes/ # ABC definitions for the framework |
| 78 | +│ ├── frequentist/ # Frequentist statistical methods |
| 79 | +│ │ ├── confidence_computers/ # Statistical computation logic |
| 80 | +│ │ ├── experiment.py # Base class for frequentist tests |
| 81 | +│ │ ├── z_test.py # Z-test implementation |
| 82 | +│ │ ├── t_test.py # Student's T-test implementation |
| 83 | +│ │ ├── chi_squared.py # Chi-squared test |
| 84 | +│ │ ├── z_test_linreg.py # Z-test with linear regression variance reduction |
| 85 | +│ │ ├── sequential_bound_solver.py # Group sequential testing |
| 86 | +│ │ ├── multiple_comparison.py # Multiple testing correction |
| 87 | +│ │ └── sample_size_calculator.py |
| 88 | +│ ├── bayesian/ # Bayesian methods |
| 89 | +│ │ └── bayesian_models.py # BetaBinomial implementation |
| 90 | +│ ├── constants.py # Shared constants |
| 91 | +│ └── confidence_utils.py # Shared utility functions |
| 92 | +├── samplesize/ # Sample size calculations |
| 93 | +├── examples.py # Example data generators |
| 94 | +├── chartgrid.py # Chart grid utilities |
| 95 | +└── options.py # Global configuration |
| 96 | +``` |
| 97 | + |
| 98 | +### Key Classes and Their Relationships |
| 99 | + |
| 100 | +- **Experiment** (in `frequentist/experiment.py`): The core base class for frequentist tests. Provides methods like: |
| 101 | + - `summary()`: Overall metric summaries |
| 102 | + - `difference()`: Pairwise comparisons |
| 103 | + - `multiple_difference()`: Multiple comparisons with correction |
| 104 | + - `difference_plot()`, `summary_plot()`, etc.: Visualization methods |
| 105 | + - `sample_size()`: Required sample size calculations |
| 106 | + - `statistical_power()`: Power analysis |
| 107 | + |
| 108 | +- **ZTest, StudentsTTest, ChiSquared**: Thin wrappers that initialize `Experiment` with the appropriate computer and method |
| 109 | + |
| 110 | +- **Computer Classes** (in `frequentist/confidence_computers/`): Handle the statistical calculations |
| 111 | + - `ZTestComputer`, `TTestComputer`, `ChiSquaredComputer`: Specific computation implementations |
| 112 | + - All inherit from `ConfidenceComputerABC` |
| 113 | + |
| 114 | +- **ChartifyGrapher**: Implements visualization using the Chartify library |
| 115 | + |
| 116 | +### Data Model |
| 117 | + |
| 118 | +The library works with DataFrames containing sufficient statistics: |
| 119 | +- `numerator_column`: Sum or count (e.g., sum of conversions) |
| 120 | +- `denominator_column`: Total observations (e.g., total users) |
| 121 | +- `numerator_sum_squares_column`: Sum of squares (optional, for variance calculations) |
| 122 | +- `categorical_group_columns`: Treatment/control groups and other dimensions |
| 123 | +- `ordinal_group_column`: Time-based grouping for sequential analysis |
| 124 | + |
| 125 | +### Important Conventions |
| 126 | + |
| 127 | +1. **Method Column**: Tests add a `METHOD_COLUMN_NAME` to data indicating the test type (e.g., "z-test", "t-test") |
| 128 | + |
| 129 | +2. **Multiple Comparison Correction**: Supported methods defined in `constants.py`: |
| 130 | + - Standard: bonferroni, holm, hommel, sidak, FDR methods |
| 131 | + - SPOT-1 variants: Custom Spotify methods for specific use cases |
| 132 | + |
| 133 | +3. **Non-Inferiority Margins (NIMs)**: Can be specified as absolute values or relative percentages |
| 134 | + |
| 135 | +4. **Sequential Testing**: The `sequential_bound_solver.py` module implements group sequential designs with spending functions |
| 136 | + |
| 137 | +5. **Variance Reduction**: `ZTestLinreg` uses pre-exposure data to fit a linear model and reduce variance (CUPED method) |
| 138 | + |
| 139 | +## Testing Guidelines |
| 140 | + |
| 141 | +- Tests are organized to mirror the source structure under `tests/` |
| 142 | +- Use pytest fixtures for common test data |
| 143 | +- Tests check both DataFrame outputs and chart generation |
| 144 | +- Coverage target is configured in `pyproject.toml` |
| 145 | + |
| 146 | +## Python Version Support |
| 147 | + |
| 148 | +Supports Python 3.9, 3.10, 3.11, and 3.12. The `tox.ini` includes a `py39-min` environment that tests with minimum dependency versions. |
| 149 | + |
| 150 | +The project uses `tox-uv` to leverage uv's fast package installation and environment management in tox, significantly speeding up multi-environment testing. The GitHub Actions CI workflow also uses uv for faster dependency installation. |
| 151 | + |
| 152 | +## Code Style |
| 153 | + |
| 154 | +- Black formatting with 119 character line length |
| 155 | +- Flake8 linting with max line length 120 |
| 156 | +- Ignored flake8 rules: E203, E231, W503 |
| 157 | +- Excluded from linting: `.venv`, `.tox`, `dist`, `build`, `scratch.py`, `confidence_dev` |
0 commit comments