This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Spotify Confidence is a Python library for A/B test analysis. It provides convenience wrappers around statsmodel's functions for computing p-values and confidence intervals. The library supports both frequentist (Z-test, Student's T-test, Chi-squared) and Bayesian (BetaBinomial) statistical methods, with features for variance reduction, sequential testing, and sample size calculations.
# Install with development dependencies (including tox-uv)
uv pip install -e . --group dev# IMPORTANT Run all tests across Python versions
# to make sure all code changes work on older Python versions
uv run tox -p auto
# Run all tests with coverage
uv run pytest
# Run tests without coverage reports
uv run pytest --no-cov
# Run specific test file
uv run pytest tests/frequentist/test_z_test.py
# Run specific test
uv run pytest tests/frequentist/test_z_test.py::test_name# Run linting
uv run ruff check
# Run formatting
uv run ruff format
# Run type checking
uv run ty check
# Run all quality checks (as done in CI)
uv run ruff check && uv run ruff format --check && uv run ty check && uv run pytest# Build distribution packages
uv run python -m buildThe library follows an object-oriented design with separation of concerns:
- Statistical Test Classes: High-level APIs (
ZTest,StudentsTTest,ChiSquared,BetaBinomial,ZTestLinreg) - Experiment Class: Base class containing shared analysis methods for frequentist tests
- Computer Classes: Perform the actual statistical computations
- Grapher Classes: Generate visualizations using Chartify
All main test classes inherit from abstract base classes in spotify_confidence/analysis/abstract_base_classes/:
ConfidenceABC: Base for all statistical test classesConfidenceComputerABC: Base for computation logicConfidenceGrapherABC: Base for visualization logic
spotify_confidence/
├── analysis/
│ ├── abstract_base_classes/ # ABC definitions for the framework
│ ├── frequentist/ # Frequentist statistical methods
│ │ ├── confidence_computers/ # Statistical computation logic
│ │ ├── experiment.py # Base class for frequentist tests
│ │ ├── z_test.py # Z-test implementation
│ │ ├── t_test.py # Student's T-test implementation
│ │ ├── chi_squared.py # Chi-squared test
│ │ ├── z_test_linreg.py # Z-test with linear regression variance reduction
│ │ ├── sequential_bound_solver.py # Group sequential testing
│ │ ├── multiple_comparison.py # Multiple testing correction
│ │ └── sample_size_calculator.py
│ ├── bayesian/ # Bayesian methods
│ │ └── bayesian_models.py # BetaBinomial implementation
│ ├── constants.py # Shared constants
│ └── confidence_utils.py # Shared utility functions
├── samplesize/ # Sample size calculations
├── examples.py # Example data generators
├── chartgrid.py # Chart grid utilities
└── options.py # Global configuration
-
Experiment (in
frequentist/experiment.py): The core base class for frequentist tests. Provides methods like:summary(): Overall metric summariesdifference(): Pairwise comparisonsmultiple_difference(): Multiple comparisons with correctiondifference_plot(),summary_plot(), etc.: Visualization methodssample_size(): Required sample size calculationsstatistical_power(): Power analysis
-
ZTest, StudentsTTest, ChiSquared: Thin wrappers that initialize
Experimentwith the appropriate computer and method -
Computer Classes (in
frequentist/confidence_computers/): Handle the statistical calculationsZTestComputer,TTestComputer,ChiSquaredComputer: Specific computation implementations- All inherit from
ConfidenceComputerABC
-
ChartifyGrapher: Implements visualization using the Chartify library
The library works with DataFrames containing sufficient statistics:
numerator_column: Sum or count (e.g., sum of conversions)denominator_column: Total observations (e.g., total users)numerator_sum_squares_column: Sum of squares (optional, for variance calculations)categorical_group_columns: Treatment/control groups and other dimensionsordinal_group_column: Time-based grouping for sequential analysis
-
Method Column: Tests add a
METHOD_COLUMN_NAMEto data indicating the test type (e.g., "z-test", "t-test") -
Multiple Comparison Correction: Supported methods defined in
constants.py:- Standard: bonferroni, holm, hommel, sidak, FDR methods
- SPOT-1 variants: Custom Spotify methods for specific use cases
-
Non-Inferiority Margins (NIMs): Can be specified as absolute values or relative percentages
-
Sequential Testing: The
sequential_bound_solver.pymodule implements group sequential designs with spending functions -
Variance Reduction:
ZTestLinreguses pre-exposure data to fit a linear model and reduce variance (CUPED method)
- Tests are organized to mirror the source structure under
tests/ - Use pytest fixtures for common test data
- Tests check both DataFrame outputs and chart generation
- Coverage target is configured in
pyproject.toml
Supports Python 3.9, 3.10, 3.11, and 3.12. The tox.ini includes a py39-min environment that tests with minimum dependency versions.
The project uses tox-uv to leverage uv's fast package installation and environment management in tox, significantly speeding up multi-environment testing. The GitHub Actions CI workflow also uses uv for faster dependency installation.