CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Spotify Confidence is a Python library for A/B test analysis. It provides convenience wrappers around statsmodel's functions for computing p-values and confidence intervals. The library supports both frequentist (Z-test, Student's T-test, Chi-squared) and Bayesian (BetaBinomial) statistical methods, with features for variance reduction, sequential testing, and sample size calculations.

Development Commands

Setup

# Install with development dependencies (including tox-uv)
uv pip install -e . --group dev

Testing

# IMPORTANT Run all tests across Python versions
# to make sure all code changes work on older Python versions
uv run tox -p auto

# Run all tests with coverage
uv run pytest

# Run tests without coverage reports
uv run pytest --no-cov

# Run specific test file
uv run pytest tests/frequentist/test_z_test.py

# Run specific test
uv run pytest tests/frequentist/test_z_test.py::test_name

Code Quality

# Run linting
uv run ruff check

# Run formatting
uv run ruff format

# Run type checking
uv run ty check

# Run all quality checks (as done in CI)
uv run ruff check && uv run ruff format --check && uv run ty check && uv run pytest

Build

# Build distribution packages
uv run python -m build

Architecture

Core Design Pattern

The library follows an object-oriented design with separation of concerns:

Statistical Test Classes: High-level APIs (ZTest, StudentsTTest, ChiSquared, BetaBinomial, ZTestLinreg)
Experiment Class: Base class containing shared analysis methods for frequentist tests
Computer Classes: Perform the actual statistical computations
Grapher Classes: Generate visualizations using Chartify

All main test classes inherit from abstract base classes in spotify_confidence/analysis/abstract_base_classes/:

ConfidenceABC: Base for all statistical test classes
ConfidenceComputerABC: Base for computation logic
ConfidenceGrapherABC: Base for visualization logic

Module Structure

spotify_confidence/
├── analysis/
│   ├── abstract_base_classes/    # ABC definitions for the framework
│   ├── frequentist/               # Frequentist statistical methods
│   │   ├── confidence_computers/  # Statistical computation logic
│   │   ├── experiment.py          # Base class for frequentist tests
│   │   ├── z_test.py              # Z-test implementation
│   │   ├── t_test.py              # Student's T-test implementation
│   │   ├── chi_squared.py         # Chi-squared test
│   │   ├── z_test_linreg.py       # Z-test with linear regression variance reduction
│   │   ├── sequential_bound_solver.py  # Group sequential testing
│   │   ├── multiple_comparison.py # Multiple testing correction
│   │   └── sample_size_calculator.py
│   ├── bayesian/                  # Bayesian methods
│   │   └── bayesian_models.py     # BetaBinomial implementation
│   ├── constants.py               # Shared constants
│   └── confidence_utils.py        # Shared utility functions
├── samplesize/                    # Sample size calculations
├── examples.py                    # Example data generators
├── chartgrid.py                   # Chart grid utilities
└── options.py                     # Global configuration

Key Classes and Their Relationships

Experiment (in frequentist/experiment.py): The core base class for frequentist tests. Provides methods like:
- summary(): Overall metric summaries
- difference(): Pairwise comparisons
- multiple_difference(): Multiple comparisons with correction
- difference_plot(), summary_plot(), etc.: Visualization methods
- sample_size(): Required sample size calculations
- statistical_power(): Power analysis
ZTest, StudentsTTest, ChiSquared: Thin wrappers that initialize Experiment with the appropriate computer and method
Computer Classes (in frequentist/confidence_computers/): Handle the statistical calculations
- ZTestComputer, TTestComputer, ChiSquaredComputer: Specific computation implementations
- All inherit from ConfidenceComputerABC
ChartifyGrapher: Implements visualization using the Chartify library

Data Model

The library works with DataFrames containing sufficient statistics:

numerator_column: Sum or count (e.g., sum of conversions)
denominator_column: Total observations (e.g., total users)
numerator_sum_squares_column: Sum of squares (optional, for variance calculations)
categorical_group_columns: Treatment/control groups and other dimensions
ordinal_group_column: Time-based grouping for sequential analysis

Important Conventions

Method Column: Tests add a METHOD_COLUMN_NAME to data indicating the test type (e.g., "z-test", "t-test")
Multiple Comparison Correction: Supported methods defined in constants.py:
- Standard: bonferroni, holm, hommel, sidak, FDR methods
- SPOT-1 variants: Custom Spotify methods for specific use cases
Non-Inferiority Margins (NIMs): Can be specified as absolute values or relative percentages
Sequential Testing: The sequential_bound_solver.py module implements group sequential designs with spending functions
Variance Reduction: ZTestLinreg uses pre-exposure data to fit a linear model and reduce variance (CUPED method)

Testing Guidelines

Tests are organized to mirror the source structure under tests/
Use pytest fixtures for common test data
Tests check both DataFrame outputs and chart generation
Coverage target is configured in pyproject.toml

Python Version Support

Supports Python 3.9, 3.10, 3.11, and 3.12. The tox.ini includes a py39-min environment that tests with minimum dependency versions.

The project uses tox-uv to leverage uv's fast package installation and environment management in tox, significantly speeding up multi-environment testing. The GitHub Actions CI workflow also uses uv for faster dependency installation.

Code Style

Linting & Formatting: Uses ruff for code linting and formatting
Type Checking: Uses ty (Astral's fast Python type checker) for static type analysis
Both tools are configured in pyproject.toml and exclude the examples/ directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Development Commands

Setup

Testing

Code Quality

Build

Architecture

Core Design Pattern

Module Structure

Key Classes and Their Relationships

Data Model

Important Conventions

Testing Guidelines

Python Version Support

Code Style

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Development Commands

Setup

Testing

Code Quality

Build

Architecture

Core Design Pattern

Module Structure

Key Classes and Their Relationships

Data Model

Important Conventions

Testing Guidelines

Python Version Support

Code Style