Skip to content

fix(stattests): honor n_bins in get_binned_data for numeric features (#1400)#1873

Open
jbbqqf wants to merge 1 commit into
evidentlyai:mainfrom
jbbqqf:fix/1400-psi-n-bins
Open

fix(stattests): honor n_bins in get_binned_data for numeric features (#1400)#1873
jbbqqf wants to merge 1 commit into
evidentlyai:mainfrom
jbbqqf:fix/1400-psi-n-bins

Conversation

@jbbqqf

@jbbqqf jbbqqf commented May 9, 2026

Copy link
Copy Markdown

Summary

Closes #1400.

The numeric branch of get_binned_data() in
src/evidently/legacy/calculations/stattests/utils.py previously ignored the
caller-supplied n parameter and always used Sturges' rule via
np.histogram_bin_edges(combined, bins=\"sturges\"). The PSI / Jensen-Shannon /
KL stat tests all pass n_bins through to this function, so their n_bins
argument was effectively a no-op on numeric features (>20 unique values).

The fix passes n through to np.histogram_bin_edges(bins=n). A regression
test (test_get_binned_data_honors_n_for_numeric) parametrised on four bin
counts asserts the returned percent arrays have shape (n,). The test fails
on origin/main (gets ~11 bins via Sturges for the 5-bin case) and passes on
this branch.

Reproduce BEFORE/AFTER yourself (copy-paste)

```bash

Run from the repo root in a clean checkout.

git fetch origin && git fetch https://github.com/jbbqqf/evidently.git fix/1400-psi-n-bins:_pr1400

run_check() {
python - <<'PY'
import numpy as np, pandas as pd
from evidently.legacy.calculations.stattests.utils import get_binned_data
from evidently.legacy.core import ColumnType
rng = np.random.default_rng(0)
ref = pd.Series(rng.normal(size=500))
cur = pd.Series(rng.normal(size=500))
for n in (5, 10, 30, 50):
r, _ = get_binned_data(ref, cur, ColumnType.Numerical, n, feel_zeroes=False)
print(f"requested n={n}, got {len(r)} bins")
PY
}

BEFORE — origin/main: n is ignored, Sturges is used.

git checkout origin/main -- src/evidently/legacy/calculations/stattests/utils.py
pip install -q -e . >/dev/null
run_check

Expected: 'got 11 bins' (or similar) for every requested n.

AFTER — this branch: n is honored.

git checkout _pr1400 -- src/evidently/legacy/calculations/stattests/utils.py
pip install -q -e . >/dev/null
run_check

Expected: 'got N bins' matching the requested n in every line.

Restore.

git checkout origin/main -- src/evidently/legacy/calculations/stattests/utils.py
```

What I ran locally

  • `pytest tests/stattests/ tests/calculations/stattests/ -q` -> 105 passed
  • `ruff check` and `ruff format --check` on touched files -> clean
  • Test `test_get_binned_data_honors_n_for_numeric` fails on `origin/main`
    with `assert (11,) == (5,)`, passes on this branch.

Edge cases / behavior change to flag for review

Scenario Before After
Numeric with > 20 unique vals, n_bins=30 (default) Used Sturges (~10-13 bins) Uses 30 bins
Numeric with <= 20 unique vals Categorical branch (unchanged) Same
Categorical features Categorical branch (unchanged) Same
n_bins=5 Ignored, Sturges used 5 bins

This is a behavior change for numeric features in PSI / Jensen-Shannon /
KL: users will see different drift values because the bin count now actually
matches what they configured. The previous behavior contradicted the
documented `n_bins` parameter and the issue reporter explicitly flagged this
contradiction. I'm flagging it here so a maintainer can decide whether to
ship this as a fix or guard with a deprecation toggle.

AI disclosure

This pull request was authored with assistance from Anthropic's Claude (an AI
coding assistant) running under my direction. I read the diff and reproduced
the BEFORE/AFTER behavior locally before submitting.

…videntlyai#1400)

The numeric branch of get_binned_data() in legacy stattests previously ignored
the caller-supplied n parameter and always used Sturges' rule via
np.histogram_bin_edges(combined, bins='sturges'). This made the n_bins
argument of the PSI, Jensen-Shannon and KL stat tests effectively a no-op
on numeric features.

Pass n through to np.histogram_bin_edges(bins=n) so the documented contract
holds. Add a regression test that exercises four different bin counts on
numeric data with > 20 unique values (to force the numeric branch).

Note: this changes PSI / JS / KL numeric values for users who relied on the
implicit Sturges binning. The default n_bins=30 now produces 30 bins instead
of ~Sturges(\~log2(N)+1) bins. Cited in PR body for reviewer awareness.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@MukundaKatta

Copy link
Copy Markdown

Good catch on the n_bins regression. One design note: quantile-based bin
edges from the baseline (vs histogram or Sturges) make PSI invariant to
baseline scale and keep bin masses ~equal under H0. I went that way in a
recent Rust PSI implementation and it removed a class of "PSI looks weird
on log-distributed data" surprises. Worth considering as a follow-up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

n_bins Argument Not Used in _psi Function

2 participants