fix(stattests): honor n_bins in get_binned_data for numeric features (#1400) by jbbqqf · Pull Request #1873 · evidentlyai/evidently

jbbqqf · 2026-05-09T22:39:36Z

Summary

Closes #1400.

The numeric branch of get_binned_data() in
src/evidently/legacy/calculations/stattests/utils.py previously ignored the
caller-supplied n parameter and always used Sturges' rule via
np.histogram_bin_edges(combined, bins=\"sturges\"). The PSI / Jensen-Shannon /
KL stat tests all pass n_bins through to this function, so their n_bins
argument was effectively a no-op on numeric features (>20 unique values).

The fix passes n through to np.histogram_bin_edges(bins=n). A regression
test (test_get_binned_data_honors_n_for_numeric) parametrised on four bin
counts asserts the returned percent arrays have shape (n,). The test fails
on origin/main (gets ~11 bins via Sturges for the 5-bin case) and passes on
this branch.

Reproduce BEFORE/AFTER yourself (copy-paste)

```bash

Run from the repo root in a clean checkout.

git fetch origin && git fetch https://github.com/jbbqqf/evidently.git fix/1400-psi-n-bins:_pr1400

run_check() {
python - <<'PY'
import numpy as np, pandas as pd
from evidently.legacy.calculations.stattests.utils import get_binned_data
from evidently.legacy.core import ColumnType
rng = np.random.default_rng(0)
ref = pd.Series(rng.normal(size=500))
cur = pd.Series(rng.normal(size=500))
for n in (5, 10, 30, 50):
r, _ = get_binned_data(ref, cur, ColumnType.Numerical, n, feel_zeroes=False)
print(f"requested n={n}, got {len(r)} bins")
PY
}

BEFORE — origin/main: n is ignored, Sturges is used.

git checkout origin/main -- src/evidently/legacy/calculations/stattests/utils.py
pip install -q -e . >/dev/null
run_check

Expected: 'got 11 bins' (or similar) for every requested n.

AFTER — this branch: n is honored.

git checkout _pr1400 -- src/evidently/legacy/calculations/stattests/utils.py
pip install -q -e . >/dev/null
run_check

Expected: 'got N bins' matching the requested n in every line.

Restore.

git checkout origin/main -- src/evidently/legacy/calculations/stattests/utils.py
```

What I ran locally

`pytest tests/stattests/ tests/calculations/stattests/ -q` -> 105 passed
`ruff check` and `ruff format --check` on touched files -> clean
Test `test_get_binned_data_honors_n_for_numeric` fails on `origin/main`
with `assert (11,) == (5,)`, passes on this branch.

Edge cases / behavior change to flag for review

Scenario	Before	After
Numeric with > 20 unique vals, n_bins=30 (default)	Used Sturges (~10-13 bins)	Uses 30 bins
Numeric with <= 20 unique vals	Categorical branch (unchanged)	Same
Categorical features	Categorical branch (unchanged)	Same
n_bins=5	Ignored, Sturges used	5 bins

This is a behavior change for numeric features in PSI / Jensen-Shannon /
KL: users will see different drift values because the bin count now actually
matches what they configured. The previous behavior contradicted the
documented `n_bins` parameter and the issue reporter explicitly flagged this
contradiction. I'm flagging it here so a maintainer can decide whether to
ship this as a fix or guard with a deprecation toggle.

AI disclosure

This pull request was authored with assistance from Anthropic's Claude (an AI
coding assistant) running under my direction. I read the diff and reproduced
the BEFORE/AFTER behavior locally before submitting.

…videntlyai#1400) The numeric branch of get_binned_data() in legacy stattests previously ignored the caller-supplied n parameter and always used Sturges' rule via np.histogram_bin_edges(combined, bins='sturges'). This made the n_bins argument of the PSI, Jensen-Shannon and KL stat tests effectively a no-op on numeric features. Pass n through to np.histogram_bin_edges(bins=n) so the documented contract holds. Add a regression test that exercises four different bin counts on numeric data with > 20 unique values (to force the numeric branch). Note: this changes PSI / JS / KL numeric values for users who relied on the implicit Sturges binning. The default n_bins=30 now produces 30 bins instead of ~Sturges(\~log2(N)+1) bins. Cited in PR body for reviewer awareness. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

MukundaKatta · 2026-05-10T15:43:17Z

Good catch on the n_bins regression. One design note: quantile-based bin
edges from the baseline (vs histogram or Sturges) make PSI invariant to
baseline scale and keep bin masses ~equal under H0. I went that way in a
recent Rust PSI implementation and it removed a class of "PSI looks weird
on log-distributed data" surprises. Worth considering as a follow-up?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(stattests): honor n_bins in get_binned_data for numeric features (#1400)#1873

fix(stattests): honor n_bins in get_binned_data for numeric features (#1400)#1873
jbbqqf wants to merge 1 commit into
evidentlyai:mainfrom
jbbqqf:fix/1400-psi-n-bins

jbbqqf commented May 9, 2026

Uh oh!

MukundaKatta commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jbbqqf commented May 9, 2026

Summary

Reproduce BEFORE/AFTER yourself (copy-paste)

Run from the repo root in a clean checkout.

BEFORE — origin/main: n is ignored, Sturges is used.

Expected: 'got 11 bins' (or similar) for every requested n.

AFTER — this branch: n is honored.

Expected: 'got N bins' matching the requested n in every line.

Restore.

What I ran locally

Edge cases / behavior change to flag for review

AI disclosure

Uh oh!

MukundaKatta commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants