A fast, deterministic Python library for creating monotonic optimal bins with respect to a target variable. MOBPY implements two distinct binning pipelines:
- Numeric x — stack-based PAVA + constrained adjacent merging (Welch's t-test)
- Categorical x — chi-square merging with multiple comparison correction (Holm by default)
- ⚡ Fast & Deterministic: O(n log n) + O(n) PAVA for numeric; O(k²) chi-square merging for categorical
- 🔀 Two Binning Paths: Numeric PAVA pipeline and categorical chi-square pipeline — unified API
- 📊 Monotonic Guarantee: Strict monotonicity between bins and target (numeric path)
- 🔧 Flexible Constraints: Min/max samples, min positives, min negatives, min/max bins — enforced on both paths
- 📈 WoE & IV Calculation: Automatic Weight of Evidence and Information Value for binary targets (all bins including Missing and Excluded)
- 🎨 Rich Visualizations: PAVA process plots, WoE bars, event rate charts, and
plot_categorical_mergefor the categorical path - ♾️ Safe Edges: First bin at -∞, last at +∞ for numeric; full category-set coverage for categorical
pip install MOBPYFor development installation:
git clone https://github.com/ChenTaHung/Monotonic-Optimal-Binning.git
cd Monotonic-Optimal-Binning
pip install -e .import pandas as pd
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_bin_statistics
import matplotlib.pyplot as plt
df = pd.read_csv('data/german_data_credit_cat.csv')
df['default'] = df['default'] - 1 # convert 1/2 → 0/1
constraints = BinningConstraints(
min_bins=4,
max_bins=6,
min_samples=0.05, # at least 5% of total samples per bin
min_positives=0.01, # at least 1% of positives per bin
min_negatives=0.01, # at least 1% of negatives per bin (ensures stable WoE)
)
binner = MonotonicBinner(df=df, x='Durationinmonth', y='default',
constraints=constraints)
binner.fit()
summary = binner.summary_()
print(summary[['bucket', 'count', 'mean', 'woe', 'iv']])Output:
bucket count mean woe iv
0 (-inf, 9) 94 0.106 1.241870 0.106307
1 [9, 16) 337 0.234 0.335632 0.035238
2 [16, 45) 499 0.343 -0.193553 0.019342
3 [45, +inf) 70 0.571 -1.127082 0.102180
import pandas as pd
from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_woe_bars, plot_categorical_merge
import matplotlib.pyplot as plt
df = pd.read_csv('data/transactions.csv')
binner = MonotonicBinner(
df=df,
x='merchant_category',
y='is_fraud',
x_type='categorical', # activate chi-square merging
categorical_alpha=0.05,
categorical_correction='holm',
constraints=BinningConstraints(max_bins=8, min_bins=2, min_samples=30),
max_label_cats=3, # truncate long bin labels: {A, B, C, ...+N}
)
binner.fit()
diag = binner.get_diagnostics()
print(f"{diag['n_initial_categories']} categories → {diag['n_final_bins']} bins")
print(f"Total IV: {binner.summary_()['iv'].sum():.4f}")
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(18, 5))
plot_woe_bars(binner.summary_(), ax=axes[0], tick_labels='auto', show_iv=True)
plot_categorical_merge(binner, ax=axes[1], show_counts=False)
plt.tight_layout()
plt.show()
# Category → bin mapping
ba = binner.bin_assignment()
for bin_idx in sorted(ba.unique()):
print(f"Bin {bin_idx} ({binner.bins_().loc[bin_idx, 'mean']:.1%}):",
sorted(ba[ba == bin_idx].index))from MOBPY.plot import plot_bin_statistics
fig = plot_bin_statistics(binner)
plt.show()plot_bin_statistics creates a multi-panel view: WoE bars · event rate · sample distribution · bin boundaries on data.
from MOBPY.plot import plot_pava_comparison
fig = plot_pava_comparison(binner)
plt.show()from MOBPY import MonotonicBinner, BinningConstraints
from MOBPY.plot import plot_categorical_merge
import matplotlib.pyplot as plt
binner = MonotonicBinner(
# Please refer to examples/E-Commerce Fraud - Categorical Binning.ipynb
)
binner.fit()
fig, ax = plt.subplots(figsize=(20, 6))
plot_categorical_merge(
binner,
ax=ax,
show_counts=False, # 60 bars — skip per-bar counts to avoid clutter
)
plt.tight_layout()
plt.show()plot_categorical_merge shows each original category as a bar, coloured by its final bin. Groups are separated by gaps; a dashed line spans each bin at its pooled event rate; the dotted line marks the overall mean.
Stage 1 — PAVA: Creates initial monotonic blocks by pooling adjacent violators.
Stage 2 — Constrained merging: Merges adjacent blocks (3 phases):
- Statistical merging (Welch's t-test, respects
max_bins) min_samplesenforcement (stop atmin_binsfloor)min_positives/min_negativesenforcement (binary targets only)
print(f"PAVA blocks: {len(binner.pava_blocks_())}")
print(f"Final bins: {len(binner.bins_())}")
# PAVA blocks: 10
# Final bins: 4Stage 1 — Chi-square merging: Pairs of category blocks are merged based on adjusted p-values (3 phases):
- Statistical merging — chi-square + Holm correction, pair-result cache keeps total cost O(k²)
min_samplesenforcementmin_positives/min_negativesenforcement
# Fractional (adaptive to data size)
constraints = BinningConstraints(
max_bins=8,
min_samples=0.05, # 5% of total samples
max_samples=0.30, # 30% of total samples
min_positives=0.02, # 2% of positive samples
min_negatives=0.02, # 2% of negative samples — prevents log(0) in WoE
)
# Absolute (fixed)
constraints = BinningConstraints(
max_bins=5,
min_samples=100,
min_positives=20,
min_negatives=50,
)age_binner = MonotonicBinner(
df=df,
x='Age',
y='default',
constraints=constraints,
exclude_values=[-999, -1, 0], # reported as separate rows in summary_()
).fit()binner = MonotonicBinner(
df=train_df, x='category', y='target',
x_type='categorical',
unseen_categories='error', # raises ValueError for unseen values (default)
# unseen_categories='unknown', # returns "Unknown" / NaN WoE instead
)
binner.fit()
# Transform test data — unseen categories handled gracefully
df['bin'] = binner.transform(test_df['category'], assign='interval')
df['woe'] = binner.transform(test_df['category'], assign='woe')new_data = pd.DataFrame({'age': [25, 45, 65]})
# Bin label
print(binner.transform(new_data['age'], assign='interval'))
# 0 (-inf, 26)
# 1 [35, 75)
# 2 [35, 75)
# WoE score
print(binner.transform(new_data['age'], assign='woe'))
# 0 -0.526748
# 1 0.306015
# 2 0.306015MOBPY is ideal for:
- Credit Risk Modeling: Create monotonic risk score bins for regulatory compliance
- Insurance Pricing: Develop age/risk factor bands with clear premium progression
- Customer Segmentation: Build ordered customer value tiers or merge categorical merchant types
- Feature Engineering: Generate interpretable binned features for scorecards
- Regulatory Reporting: Ensure transparent, monotonic relationships in models
- API Reference — Project structure and workflow
- MonotonicBinner — Full class API (numeric + categorical)
- BinningConstraints — Constraint configuration
- Categorical Merge Module — Chi-square algorithm details
- Plot Module — All visualization functions
- plot_categorical_merge — Categorical merge visualization
- Examples & Tutorials — Jupyter notebooks with real-world examples
# Run all tests
.venv/bin/python -m pytest tests/ -q- Mironchyk, Pavel, and Viktor Tchistiakov. Monotone optimal binning algorithm for credit risk modeling. (2017)
- Smalbil, P. J. The choices of weights in the iterative convex minorant algorithm. (2015)
- Testing Dataset 1: German Credit Risk from Kaggle
- Testing Dataset 2: US Health Insurance Dataset from Kaggle
- GitHub Project: Monotone Optimal Binning (SAS 9.4 version)
-
Ta-Hung (Denny) Chen
- LinkedIn: https://www.linkedin.com/in/dennychen-tahung/
- E-mail: denny20700@gmail.com
-
Yu-Cheng (Darren) Tsai
-
Peter Chen
- LinkedIn: https://www.linkedin.com/in/peterchentsungwei/
- E-mail: peterwei20700@gmail.com
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.


